Search | arXiv e-print repository

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.17781 [pdf, other]

Dissipation-induced bound states as a two-level system

Authors: Hong Peng Zhang, Zhi Song

Abstract: Potential wells are employed to constrain quantum particles into forming discrete energy levels, acting as artificial few-level systems. In contrast, an anti-parity-time ($\mathcal{PT}$) symmetric system can have a single pair of real energy levels, while all the remaining levels are unstable due to the negative imaginary part of the energy. In this work, we investigate the formation of bound stat… ▽ More Potential wells are employed to constrain quantum particles into forming discrete energy levels, acting as artificial few-level systems. In contrast, an anti-parity-time ($\mathcal{PT}$) symmetric system can have a single pair of real energy levels, while all the remaining levels are unstable due to the negative imaginary part of the energy. In this work, we investigate the formation of bound states in a tight-binding chain induced by a harmonic imaginary potential. Exact solutions show that the real parts of energy levels are equidistant, while the imaginary parts are semi-negative definite and equidistant. This allows for the formation of an effective two-level system. For a given initial state with a wide range of profiles, the evolved state always converges to a superposition of two stable eigenstates. In addition, these two states are orthogonal under the Dirac inner product and can be mutually switched by applying a $π$ pulse of a linear field. Our finding provides an alternative method for fabricating quantum devices through dissipation. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 7 pages, 5 figures

arXiv:2405.17612 [pdf, ps, other]

A note on the error analysis of data-driven closure models for large eddy simulations of turbulence

Authors: Dibyajyoti Chakraborty, Shivam Barwey, Hong Zhang, Romit Maulik

Abstract: In this work, we provide a mathematical formulation for error propagation in flow trajectory prediction using data-driven turbulence closure modeling. Under the assumption that the predicted state of a large eddy simulation prediction must be close to that of a subsampled direct numerical simulation, we retrieve an upper bound for the prediction error when utilizing a data-driven closure model. We… ▽ More In this work, we provide a mathematical formulation for error propagation in flow trajectory prediction using data-driven turbulence closure modeling. Under the assumption that the predicted state of a large eddy simulation prediction must be close to that of a subsampled direct numerical simulation, we retrieve an upper bound for the prediction error when utilizing a data-driven closure model. We also demonstrate that this error is significantly affected by the time step size and the Jacobian which play a role in amplifying the initial one-step error made by using the closure. Our analysis also shows that the error propagates exponentially with rollout time and the upper bound of the system Jacobian which is itself influenced by the Jacobian of the closure formulation. These findings could enable the development of new regularization techniques for ML models based on the identified error-bound terms, improving their robustness and reducing error propagation. △ Less

Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17403 [pdf, other]

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Authors: Kai Wang, Yukun Zhou, Mingjia Shi, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, Yang You

Abstract: Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps can be empirically divided into acceleration, deceleration, and convergence areas based on the process increment. ii) These time steps are imbalanced, with many… ▽ More Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps can be empirically divided into acceleration, deceleration, and convergence areas based on the process increment. ii) These time steps are imbalanced, with many concentrated in the convergence area. iii) The concentrated steps provide limited benefits for diffusion training. To address this, we design an asymmetric sampling strategy that reduces the frequency of steps from the convergence area while increasing the sampling probability for steps from other areas. Additionally, we propose a weighting strategy to emphasize the importance of time steps with rapid-change process increments. As a plug-and-play and architecture-agnostic approach, SpeeD consistently achieves 3-times acceleration across various diffusion architectures, datasets, and tasks. Notably, due to its simple design, our approach significantly reduces the cost of diffusion model training with minimal overhead. Our research enables more researchers to train diffusion models at a lower cost. △ Less

Submitted 27 May, 2024; originally announced May 2024.

ACM Class: I.2

arXiv:2405.17393 [pdf, other]

EASI-Tex: Edge-Aware Mesh Texturing from Single Image

Authors: Sai Raj Kishore Perla, Yizhi Wang, Ali Mahdavi-Amiri, Hao Zhang

Abstract: We present a novel approach for single-image mesh texturing, which employs a diffusion model with judicious conditioning to seamlessly transfer an object's texture from a single RGB image to a given 3D mesh object. We do not assume that the two objects belong to the same category, and even if they do, there can be significant discrepancies in their geometry and part proportions. Our method aims to… ▽ More We present a novel approach for single-image mesh texturing, which employs a diffusion model with judicious conditioning to seamlessly transfer an object's texture from a single RGB image to a given 3D mesh object. We do not assume that the two objects belong to the same category, and even if they do, there can be significant discrepancies in their geometry and part proportions. Our method aims to rectify the discrepancies by conditioning a pre-trained Stable Diffusion generator with edges describing the mesh through ControlNet, and features extracted from the input image using IP-Adapter to generate textures that respect the underlying geometry of the mesh and the input texture without any optimization or training. We also introduce Image Inversion, a novel technique to quickly personalize the diffusion model for a single concept using a single image, for cases where the pre-trained IP-Adapter falls short in capturing all the details from the input image faithfully. Experimental results demonstrate the efficiency and effectiveness of our edge-aware single-image mesh texturing approach, coined EASI-Tex, in preserving the details of the input texture on diverse 3D objects, while respecting their geometry. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: ACM Transactions on Graphics (Proceedings of SIGGRAPH), 2024. Project Page: https://sairajk.github.io/easi-tex/

arXiv:2405.17336 [pdf, other]

XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser

Authors: Xianfu Cheng, Hang Zhang, Jian Yang, Xiang Li, Weixiao Zhou, Kui Wu, Fei Liu, Wei Zhang, Tao Sun, Tongliang Li, Zhoujun Li

Abstract: In the domain of document AI, semi-structured form parsing plays a crucial role. This task leverages techniques from key information extraction (KIE), dealing with inputs that range from plain text to intricate modal data comprising images and structural layouts. The advent of pre-trained multimodal models has driven the extraction of key information from form documents in different formats such a… ▽ More In the domain of document AI, semi-structured form parsing plays a crucial role. This task leverages techniques from key information extraction (KIE), dealing with inputs that range from plain text to intricate modal data comprising images and structural layouts. The advent of pre-trained multimodal models has driven the extraction of key information from form documents in different formats such as PDFs and images. Nonetheless, the endeavor of form parsing is still encumbered by notable challenges like subpar capabilities in multi-lingual parsing and diminished recall in contexts rich in text and visuals. In this work, we introduce a simple but effective \textbf{M}ultimodal and \textbf{M}ultilingual semi-structured \textbf{FORM} \textbf{PARSER} (\textbf{XFormParser}), which is anchored on a comprehensive pre-trained language model and innovatively amalgamates semantic entity recognition (SER) and relation extraction (RE) into a unified framework, enhanced by a novel staged warm-up training approach that employs soft labels to significantly refine form parsing accuracy without amplifying inference overhead. Furthermore, we have developed a groundbreaking benchmark dataset, named InDFormBench, catering specifically to the parsing requirements of multilingual forms in various industrial contexts. Through rigorous testing on established multilingual benchmarks and InDFormBench, XFormParser has demonstrated its unparalleled efficacy, notably surpassing the state-of-the-art (SOTA) models in RE tasks within language-specific setups by achieving an F1 score improvement of up to 1.79\%. Our framework exhibits exceptionally improved performance across tasks in both multi-language and zero-shot contexts when compared to existing SOTA benchmarks. The code is publicly available at https://github.com/zhbuaa0/layoutlmft. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 10 pages, 3 figures, 6 tables

arXiv:2405.17315 [pdf, other]

All-day Depth Completion

Authors: Vadim Ezhov, Hyoungseob Park, Zhaoyang Zhang, Rishi Upadhyay, Howard Zhang, Chethan Chinder Chandrappa, Achuta Kadambi, Yunhao Ba, Julie Dorsey, Alex Wong

Abstract: We propose a method for depth estimation under different illumination conditions, i.e., day and night time. As photometry is uninformative in regions under low-illumination, we tackle the problem through a multi-sensor fusion approach, where we take as input an additional synchronized sparse point cloud (i.e., from a LiDAR) projected onto the image plane as a sparse depth map, along with a camera… ▽ More We propose a method for depth estimation under different illumination conditions, i.e., day and night time. As photometry is uninformative in regions under low-illumination, we tackle the problem through a multi-sensor fusion approach, where we take as input an additional synchronized sparse point cloud (i.e., from a LiDAR) projected onto the image plane as a sparse depth map, along with a camera image. The crux of our method lies in the use of the abundantly available synthetic data to first approximate the 3D scene structure by learning a map** from sparse to (coarse) dense depth maps along with their predictive uncertainty - we term this, SpaDe. In poorly illuminated regions where photometric intensities do not afford the inference of local shape, the coarse approximation of scene depth serves as a prior; the uncertainty map is then used with the image to guide refinement through an uncertainty-driven residual learning (URL) scheme. The resulting depth completion network leverages complementary strengths from both modalities - depth is sparse but insensitive to illumination and in metric scale, and image is dense but sensitive with scale ambiguity. SpaDe can be used in a plug-and-play fashion, which allows for 25% improvement when augmented onto existing methods to preprocess sparse depth. We demonstrate URL on the nuScenes dataset where we improve over all baselines by an average 11.65% in all-day scenarios, 11.23% when tested specifically for daytime, and 13.12% for nighttime scenes. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 8 pages, 4 figures

arXiv:2405.17220 [pdf, other]

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

Authors: Tianyu Yu, Haoye Zhang, Yuan Yao, Yunkai Dang, Da Chen, Xiaoman Lu, Ganqu Cui, Taiwen He, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

Abstract: Learning from feedback reduces the hallucination of multimodal large language models (MLLMs) by aligning them with human preferences. While traditional methods rely on labor-intensive and time-consuming manual labeling, recent approaches employing models as automatic labelers have shown promising results without human intervention. However, these methods heavily rely on costly proprietary models l… ▽ More Learning from feedback reduces the hallucination of multimodal large language models (MLLMs) by aligning them with human preferences. While traditional methods rely on labor-intensive and time-consuming manual labeling, recent approaches employing models as automatic labelers have shown promising results without human intervention. However, these methods heavily rely on costly proprietary models like GPT-4V, resulting in scalability issues. Moreover, this paradigm essentially distills the proprietary models to provide a temporary solution to quickly bridge the performance gap. As this gap continues to shrink, the community is soon facing the essential challenge of aligning MLLMs using labeler models of comparable capability. In this work, we introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm for super GPT-4V trustworthiness. RLAIF-V maximally exploits the open-source feedback from two perspectives, including high-quality feedback data and online feedback learning algorithm. Extensive experiments on seven benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models without sacrificing performance on other tasks. Using a 34B model as labeler, RLAIF-V 7B model reduces object hallucination by 82.9\% and overall hallucination by 42.1\%, outperforming the labeler model. Remarkably, RLAIF-V also reveals the self-alignment potential of open-source MLLMs, where a 12B model can learn from the feedback of itself to achieve less than 29.5\% overall hallucination rate, surpassing GPT-4V (45.9\%) by a large margin. The results shed light on a promising route to enhance the efficacy of leading-edge MLLMs. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Project Website: https://github.com/RLHF-V/RLAIF-V

arXiv:2405.17200 [pdf, ps, other]

A Mathematical Theory of Integer Quantum Hall Effect in Photonics

Authors: Jiayu Qiu, Hai Zhang

Abstract: This paper investigates interface modes in a square lattice of photonic crystal composed of gyromagnetic particles with $C_{4v}$ point group symmetry. The study shows that Dirac or linear degenerate points cannot occur at the three high symmetry points in the Brillouin zone where two Bloch bands touch. Instead, a touch point at the M-point has a quadratic degeneracy in the generic case. It is furt… ▽ More This paper investigates interface modes in a square lattice of photonic crystal composed of gyromagnetic particles with $C_{4v}$ point group symmetry. The study shows that Dirac or linear degenerate points cannot occur at the three high symmetry points in the Brillouin zone where two Bloch bands touch. Instead, a touch point at the M-point has a quadratic degeneracy in the generic case. It is further proved that when a magnetic field is applied to the two sides of an interface in opposite directions, two interface modes that are supported along that interface can be bifurcated from the quadratic degenerate point. The results provide a mathematical foundation for the first experiment realization of the integer quantum Hall effect in the context of photonics. △ Less

Submitted 27 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16874 [pdf, other]

CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild

Authors: Xingqun Qi, Hengyuan Zhang, Yatian Wang, Jiahao Pan, Chen Liu, Peng Li, Xiaowei Chi, Mengfei Li, Qixun Zhang, Wei Xue, Shanghang Zhang, Wenhan Luo, Qifeng Liu, Yike Guo

Abstract: Deriving co-speech 3D gestures has seen tremendous progress in virtual avatar animation. Yet, the existing methods often produce stiff and unreasonable gestures with unseen human speech inputs due to the limited 3D speech-gesture data. In this paper, we propose CoCoGesture, a novel framework enabling vivid and diverse gesture synthesis from unseen human speech prompts. Our key insight is built upo… ▽ More Deriving co-speech 3D gestures has seen tremendous progress in virtual avatar animation. Yet, the existing methods often produce stiff and unreasonable gestures with unseen human speech inputs due to the limited 3D speech-gesture data. In this paper, we propose CoCoGesture, a novel framework enabling vivid and diverse gesture synthesis from unseen human speech prompts. Our key insight is built upon the custom-designed pretrain-fintune training paradigm. At the pretraining stage, we aim to formulate a large generalizable gesture diffusion model by learning the abundant postures manifold. Therefore, to alleviate the scarcity of 3D data, we first construct a large-scale co-speech 3D gesture dataset containing more than 40M meshed posture instances across 4.3K speakers, dubbed GES-X. Then, we scale up the large unconditional diffusion model to 1B parameters and pre-train it to be our gesture experts. At the finetune stage, we present the audio ControlNet that incorporates the human voice as condition prompts to guide the gesture generation. Here, we construct the audio ControlNet through a trainable copy of our pre-trained diffusion model. Moreover, we design a novel Mixture-of-Gesture-Experts (MoGE) block to adaptively fuse the audio embedding from the human speech and the gesture features from the pre-trained gesture experts with a routing mechanism. Such an effective manner ensures audio embedding is temporal coordinated with motion features while preserving the vivid and diverse gesture generation. Extensive experiments demonstrate that our proposed CoCoGesture outperforms the state-of-the-art methods on the zero-shot speech-to-gesture generation. The dataset will be publicly available at: https://mattie-e.github.io/GES-X/ △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: The dataset will be released as soon as possible

arXiv:2405.16856 [pdf, other]

Can We Trust LLMs? Mitigate Overconfidence Bias in LLMs through Knowledge Transfer

Authors: Haoyan Yang, Yixuan Wang, Xingyin Xu, Hanyuan Zhang, Yirong Bian

Abstract: The study explores mitigating overconfidence bias in LLMs to improve their reliability. We introduce a knowledge transfer (KT) method utilizing chain of thoughts, where "big" LLMs impart knowledge to "small" LLMs via detailed, sequential reasoning paths. This method uses advanced reasoning of larger models to fine-tune smaller models, enabling them to produce more accurate predictions with calibra… ▽ More The study explores mitigating overconfidence bias in LLMs to improve their reliability. We introduce a knowledge transfer (KT) method utilizing chain of thoughts, where "big" LLMs impart knowledge to "small" LLMs via detailed, sequential reasoning paths. This method uses advanced reasoning of larger models to fine-tune smaller models, enabling them to produce more accurate predictions with calibrated confidence. Experimental evaluation using multiple-choice questions and sentiment analysis across diverse datasets demonstrated the KT method's superiority over the vanilla and question-answer pair (QA) fine-tuning methods. The most significant improvement in three key metrics, where the KT method outperformed the vanilla and QA methods by an average of 55.3% and 43.1%, respectively. These findings underscore the KT method's potential in enhancing model trustworthiness and accuracy, offering precise outputs with well-matched confidence levels across various contexts. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16789 [pdf, other]

NoteLLM-2: Multimodal Large Representation Models for Recommendation

Authors: Chao Zhang, Haoxin Zhang, Shiwei Wu, Di Wu, Tong Xu, Yan Gao, Yao Hu, Enhong Chen

Abstract: Large Language Models (LLMs) have demonstrated exceptional text understanding. Existing works explore their application in text embedding tasks. However, there are few works utilizing LLMs to assist multimodal representation tasks. In this work, we investigate the potential of LLMs to enhance multimodal representation in multimodal item-to-item (I2I) recommendations. One feasible method is the tra… ▽ More Large Language Models (LLMs) have demonstrated exceptional text understanding. Existing works explore their application in text embedding tasks. However, there are few works utilizing LLMs to assist multimodal representation tasks. In this work, we investigate the potential of LLMs to enhance multimodal representation in multimodal item-to-item (I2I) recommendations. One feasible method is the transfer of Multimodal Large Language Models (MLLMs) for representation tasks. However, pre-training MLLMs usually requires collecting high-quality, web-scale multimodal data, resulting in complex training procedures and high costs. This leads the community to rely heavily on open-source MLLMs, hindering customized training for representation scenarios. Therefore, we aim to design an end-to-end training method that customizes the integration of any existing LLMs and vision encoders to construct efficient multimodal representation models. Preliminary experiments show that fine-tuned LLMs in this end-to-end method tend to overlook image content. To overcome this challenge, we propose a novel training framework, NoteLLM-2, specifically designed for multimodal representation. We propose two ways to enhance the focus on visual information. The first method is based on the prompt viewpoint, which separates multimodal content into visual content and textual content. NoteLLM-2 adopts the multimodal In-Content Learning method to teach LLMs to focus on both modalities and aggregate key information. The second method is from the model architecture, utilizing a late fusion mechanism to directly fuse visual information into textual information. Extensive experiments have been conducted to validate the effectiveness of our method. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 19 pages, 5 figures

arXiv:2405.16664 [pdf]

Deep learning improved autofocus for motion artifact reduction and its application in quantitative susceptibility map**

Authors: Chao Li, **wei Zhang, Hang Zhang, Jiahao Li, Pascal Spincemaille, Thanh D. Nguyen, Yi Wang

Abstract: Purpose: To develop a pipeline for motion artifact correction in mGRE and quantitative susceptibility map** (QSM). Methods: Deep learning is integrated with autofocus to improve motion artifact suppression, which is applied QSM of patients with Parkinson's disease (PD). The estimation of affine motion parameters in the autofocus method depends on signal-to-noise ratio and lacks accuracy when dat… ▽ More Purpose: To develop a pipeline for motion artifact correction in mGRE and quantitative susceptibility map** (QSM). Methods: Deep learning is integrated with autofocus to improve motion artifact suppression, which is applied QSM of patients with Parkinson's disease (PD). The estimation of affine motion parameters in the autofocus method depends on signal-to-noise ratio and lacks accuracy when data sampling occurs outside the k-space center. A deep learning strategy is employed to remove the residual motion artifacts in autofocus. Results: Results obtained in simulated brain data (n =15) with reference truth show that the proposed autofocus deep learning method significantly improves the image quality of mGRE and QSM (p = 0.001 for SSIM, p < 0.0001 for PSNR and RMSE). Results from 10 PD patients with real motion artifacts in QSM have also been corrected using the proposed method and sent to an experienced radiologist for image quality evaluation, and the average image quality score has increased (p=0.0039). Conclusions: The proposed method enables substantial suppression of motion artifacts in mGRE and QSM. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16583 [pdf]

An erbium-doped waveguide amplifier on thin film lithium niobate with an output power exceeding 100 mW

Authors: Rui Bao, Zhiwei Fang, Jian Liu, Zhaoxiang Liu, **ming Chen, Min Wang, Rongbo Wu, Haisu Zhang, Ya Cheng

Abstract: We demonstrate high-power thin film lithium niobate (TFLN) erbium-doped waveguide amplifier (EDWA) with a maximum on-chip output power of 113 mW and a gain of 16 dB. The on-chip integrated EDWA is composed of large mode area (LMA) waveguide structures with a total length of 7 cm and a footprint of 1x1 cm2. Particularly, we connect segmented LMA waveguides with waveguide tapers to achieve on-chip m… ▽ More We demonstrate high-power thin film lithium niobate (TFLN) erbium-doped waveguide amplifier (EDWA) with a maximum on-chip output power of 113 mW and a gain of 16 dB. The on-chip integrated EDWA is composed of large mode area (LMA) waveguide structures with a total length of 7 cm and a footprint of 1x1 cm2. Particularly, we connect segmented LMA waveguides with waveguide tapers to achieve on-chip mode conversion which maintains single-mode propagation all over the EDWA even at the waveguide bends. The design leads to significant increase of the amplified signal power by orders of magnitude and will open an avenue for applications such as on-chip high-power lasers and amplifiers system. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 13 pages, 4 figures

arXiv:2405.16567 [pdf, other]

Automatic Jailbreaking of the Text-to-Image Generative AI Systems

Authors: Minseon Kim, Hyomin Lee, Boqing Gong, Huishuai Zhang, Sung Ju Hwang

Abstract: Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on large language models (LLMs). At the same time, there are diverse safety risks that can cause the generation of malicious contents by circumventing the alignment in LLMs, which are often referred to as jai… ▽ More Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on large language models (LLMs). At the same time, there are diverse safety risks that can cause the generation of malicious contents by circumventing the alignment in LLMs, which are often referred to as jailbreaking. However, most of the previous works only focused on the text-based jailbreaking in LLMs, and the jailbreaking of the text-to-image (T2I) generation system has been relatively overlooked. In this paper, we first evaluate the safety of the commercial T2I generation systems, such as ChatGPT, Copilot, and Gemini, on copyright infringement with naive prompts. From this empirical study, we find that Copilot and Gemini block only 12% and 17% of the attacks with naive prompts, respectively, while ChatGPT blocks 84% of them. Then, we further propose a stronger automated jailbreaking pipeline for T2I generation systems, which produces prompts that bypass their safety guards. Our automated jailbreaking framework leverages an LLM optimizer to generate prompts to maximize degree of violation from the generated images without any weight updates or gradient computation. Surprisingly, our simple yet effective approach successfully jailbreaks the ChatGPT with 11.0% block rate, making it generate copyrighted contents in 76% of the time. Finally, we explore various defense strategies, such as post-generation filtering and machine unlearning techniques, but found that they were inadequate, which suggests the necessity of stronger defense mechanisms. △ Less

Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

Comments: Under review

arXiv:2405.16510 [pdf, other]

Meta-Task Planning for Language Agents

Authors: Cong Zhang, Derrick Goh Xin Deik, Dexun Li, Hao Zhang, Yong Liu

Abstract: The rapid advancement of neural language models has sparked a new surge of intelligent agent research. Unlike traditional agents, large language model-based agents (LLM agents) have emerged as a promising paradigm for achieving artificial general intelligence (AGI) due to their superior reasoning and generalization capabilities. Effective planning is crucial for the success of LLM agents in real-w… ▽ More The rapid advancement of neural language models has sparked a new surge of intelligent agent research. Unlike traditional agents, large language model-based agents (LLM agents) have emerged as a promising paradigm for achieving artificial general intelligence (AGI) due to their superior reasoning and generalization capabilities. Effective planning is crucial for the success of LLM agents in real-world tasks, making it a highly pursued topic in the community. Current planning methods typically translate tasks into executable action sequences. However, determining a feasible or optimal sequence for complex tasks at fine granularity, which often requires compositing long chains of heterogeneous actions, remains challenging. This paper introduces Meta-Task Planning (MTP), a zero-shot methodology for collaborative LLM-based multi-agent systems that simplifies complex task planning by decomposing it into a hierarchy of subordinate tasks, or meta-tasks. Each meta-task is then mapped into executable actions. MTP was assessed on two rigorous benchmarks, TravelPlanner and API-Bank. Notably, MTP achieved an average $\sim40\%$ success rate on TravelPlanner, significantly higher than the state-of-the-art (SOTA) baseline ($2.92\%$), and outperforming $LLM_{api}$-4 with ReAct on API-Bank by $\sim14\%$, showing the immense potential of integrating LLM with multi-agent systems. △ Less

Submitted 30 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16395 [pdf, other]

Daily Physical Activity Monitoring -- Adaptive Learning from Multi-source Motion Sensor Data

Authors: Haoting Zhang, Donglin Zhan, Yunduan Lin, **ghai He, Qing Zhu, Zuo-Jun Max Shen, Zeyu Zheng

Abstract: In healthcare applications, there is a growing need to develop machine learning models that use data from a single source, such as that from a wrist wearable device, to monitor physical activities, assess health risks, and provide immediate health recommendations or interventions. However, the limitation of using single-source data often compromises the model's accuracy, as it fails to capture the… ▽ More In healthcare applications, there is a growing need to develop machine learning models that use data from a single source, such as that from a wrist wearable device, to monitor physical activities, assess health risks, and provide immediate health recommendations or interventions. However, the limitation of using single-source data often compromises the model's accuracy, as it fails to capture the full scope of human activities. While a more comprehensive dataset can be gathered in a lab setting using multiple sensors attached to various body parts, this approach is not practical for everyday use due to the impracticality of wearing multiple sensors. To address this challenge, we introduce a transfer learning framework that optimizes machine learning models for everyday applications by leveraging multi-source data collected in a laboratory setting. We introduce a novel metric to leverage the inherent relationship between these multiple data sources, as they are all paired to capture aspects of the same physical activity. Through numerical experiments, our framework outperforms existing methods in classification accuracy and robustness to noise, offering a promising avenue for the enhancement of daily activity monitoring. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.16256 [pdf, other]

HetHub: A Heterogeneous distributed hybrid training system for large-scale models

Authors: Si Xu, Zixiao Huang, Yan Zeng, Shengen Yan, Xuefei Ning, Haolin Ye, Sipei Gu, Chunsheng Shui, Zhezheng Lin, Hao Zhang, Sheng Wang, Guohao Dai, Yu Wang

Abstract: The development of large-scale models relies on a vast number of computing resources. For example, the GPT-4 model (1.8 trillion parameters) requires 25000 A100 GPUs for its training. It is a challenge to build a large-scale cluster with a type of GPU-accelerator. Using multiple types of GPU-accelerators to construct a cluster is an effective way to solve the problem of insufficient homogeneous GP… ▽ More The development of large-scale models relies on a vast number of computing resources. For example, the GPT-4 model (1.8 trillion parameters) requires 25000 A100 GPUs for its training. It is a challenge to build a large-scale cluster with a type of GPU-accelerator. Using multiple types of GPU-accelerators to construct a cluster is an effective way to solve the problem of insufficient homogeneous GPU-accelerators. However, the existing distributed training systems for large-scale models only support homogeneous GPU-accelerators, not heterogeneous GPU-accelerators. To address the problem, this paper proposes a distributed training system with hybrid parallelism support on heterogeneous GPU-accelerators for large-scale models. It introduces a distributed unified communicator to realize the communication between heterogeneous GPU-accelerators, a distributed performance predictor, and an automatic hybrid parallel module to develop and train models efficiently with heterogeneous GPU-accelerators. Compared to the distributed training system with homogeneous GPU-accelerators, our system can support six different combinations of heterogeneous GPU-accelerators and the optimal performance of heterogeneous GPU-accelerators has achieved at least 90% of the theoretical upper bound performance of homogeneous GPU-accelerators. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.16184 [pdf, other]

Safe Deep Model-Based Reinforcement Learning with Lyapunov Functions

Authors: Harry Zhang

Abstract: Model-based Reinforcement Learning (MBRL) has shown many desirable properties for intelligent control tasks. However, satisfying safety and stability constraints during training and rollout remains an open question. We propose a new Model-based RL framework to enable efficient policy learning with unknown dynamics based on learning model predictive control (LMPC) framework with mathematically prov… ▽ More Model-based Reinforcement Learning (MBRL) has shown many desirable properties for intelligent control tasks. However, satisfying safety and stability constraints during training and rollout remains an open question. We propose a new Model-based RL framework to enable efficient policy learning with unknown dynamics based on learning model predictive control (LMPC) framework with mathematically provable guarantees of stability. We introduce and explore a novel method for adding safety constraints for model-based RL during training and policy learning. The new stability-augmented framework consists of a neural-network-based learner that learns to construct a Lyapunov function, and a model-based RL agent to consistently complete the tasks while satisfying user-specified constraints given only sub-optimal demonstrations and sparse-cost feedback. We demonstrate the capability of the proposed framework through simulated experiments. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.16165 [pdf, ps, other]

General Discussions on the SU(2) Vector Boson Dark Matter Model with a Single Higgs Multiplet -- Lagrangian, Discrete Subgroups, and Scalar Classifications

Authors: Chun-Xue Yuan, Zhao Zhang, Chengfeng Cai, Yi-Lei Tang, Hong-Hao Zhang

Abstract: The vector boson dark matter particles which stem from some broken gauge symmetries usually requires some unbroken symmetries to keep themselves stable. In the previous literature, some simplest cases have been discussed, in which the unbroken symmetry is provided by a remnant subgroup of the gauge group. It would be interesting to ask whether all the possible remnant subgroups as well as all the… ▽ More The vector boson dark matter particles which stem from some broken gauge symmetries usually requires some unbroken symmetries to keep themselves stable. In the previous literature, some simplest cases have been discussed, in which the unbroken symmetry is provided by a remnant subgroup of the gauge group. It would be interesting to ask whether all the possible remnant subgroups as well as all the possible coupling forms can be enumerated. Classifying all the Higgs components into different mass degenerate representations to simplify the diagonalization processes is also necessary. Rather than the ambitious target of providing a general solution to all kinds of gauge groups configured with all forms of the Higgs multiplets, in this paper, we concentrate on the case of $\text{SU(2)}_{\text{D}}$ gauge group together with a single Higgs multiplet. We enumerate all possible discrete subgroups that can survive up to $n=21$, where $n$ is the dimension of the Higgs multiplet. We also provide the general algorithms to enumerate all possible renormalizable operators, to write down the general forms of the vacuum expectation value (VEV) configurations, and to give the detailed results of all the mass degenerate irreducible representations embedded in the Higgs multiplet. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.16054 [pdf, ps, other]

Rescattering effects in $b\to s$ processes

Authors: Yao Yu, Hai-Bing Fu, Han Zhang, Bai-Cian Ke

Abstract: The measurements in $b\to s$ penguin-dominated decays are widely recognized as promising avenues for searching for New Physics by studying the deviation from theoretical estimations within the Standard Model. However, most current estimations focus only on short-distance interactions, and the influence oflong-distance interactions may exclude the impact of new physics. This suggests that it is ess… ▽ More The measurements in $b\to s$ penguin-dominated decays are widely recognized as promising avenues for searching for New Physics by studying the deviation from theoretical estimations within the Standard Model. However, most current estimations focus only on short-distance interactions, and the influence oflong-distance interactions may exclude the impact of new physics. This suggests that it is essential to estimate the long-distance contribution before probing New Physics in penguin-dominated decays. We highlights an oversight regarding long-distance interactions, which significantly contribute to the decay $B_s\to K^{*0}\bar{K}^{*0}$. Both short-distance and long-distance interactions play crucial roles in this decay. We give estimations of the branching ratio and longitudinal polarization of $B_s\to K^{*0}\bar{K}^{*0}$, which is consistent with experimental observations. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15994 [pdf, ps, other]

Verified Safe Reinforcement Learning for Neural Network Dynamic Models

Authors: Junlin Wu, Huan Zhang, Yevgeniy Vorobeychik

Abstract: Learning reliably safe autonomous control is one of the core problems in trustworthy autonomy. However, training a controller that can be formally verified to be safe remains a major challenge. We introduce a novel approach for learning verified safe control policies in nonlinear neural dynamical systems while maximizing overall performance. Our approach aims to achieve safety in the sense of fini… ▽ More Learning reliably safe autonomous control is one of the core problems in trustworthy autonomy. However, training a controller that can be formally verified to be safe remains a major challenge. We introduce a novel approach for learning verified safe control policies in nonlinear neural dynamical systems while maximizing overall performance. Our approach aims to achieve safety in the sense of finite-horizon reachability proofs, and is comprised of three key parts. The first is a novel curriculum learning scheme that iteratively increases the verified safe horizon. The second leverages the iterative nature of gradient-based learning to leverage incremental verification, reusing information from prior verification runs. Finally, we learn multiple verified initial-state-dependent controllers, an idea that is especially valuable for more complex domains where learning a single universal verified safe controller is extremely challenging. Our experiments on five safe control problems demonstrate that our trained controllers can achieve verified safety over horizons that are as much as an order of magnitude longer than state-of-the-art baselines, while maintaining high reward, as well as a perfect safety record over entire episodes. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15571 [pdf, other]

RCInvestigator: Towards Better Investigation of Anomaly Root Causes in Cloud Computing Systems

Authors: Shuhan Liu, Yunfan Zhou, Lu Ying, Yuan Tian, Jue Zhang, Shandan Zhou, Weiwei Cui, Qingwei Lin, Thomas Moscibroda, Haidong Zhang, Di Weng, Yingcai Wu

Abstract: Finding the root causes of anomalies in cloud computing systems quickly is crucial to ensure availability and efficiency since accurate root causes can guide engineers to take appropriate actions to address the anomalies and maintain customer satisfaction. However, it is difficult to investigate and identify the root causes based on large-scale and high-dimension monitoring data collected from com… ▽ More Finding the root causes of anomalies in cloud computing systems quickly is crucial to ensure availability and efficiency since accurate root causes can guide engineers to take appropriate actions to address the anomalies and maintain customer satisfaction. However, it is difficult to investigate and identify the root causes based on large-scale and high-dimension monitoring data collected from complex cloud computing environments. Due to the inherently dynamic characteristics of cloud computing systems, the existing approaches in practice largely rely on manual analyses for flexibility and reliability, but massive unpredictable factors and high data complexity make the process time-consuming. Despite recent advances in automated detection and investigation approaches, the speed and quality of root cause analyses remain limited by the lack of expert involvement in these approaches. The limitations found in the current solutions motivate us to propose a visual analytics approach that facilitates the interactive investigation of the anomaly root causes in cloud computing systems. We identified three challenges, namely, a) modeling databases for the root cause investigation, b) inferring root causes from large-scale time series, and c) building comprehensible investigation results. In collaboration with domain experts, we addressed these challenges with RCInvestigator, a novel visual analytics system that establishes a tight collaboration between human and machine and assists experts in investigating the root causes of cloud computing system anomalies. We evaluated the effectiveness of RCInvestigator through two use cases based on real-world data and received positive feedback from experts. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15297 [pdf]

doi 10.1103/PhysRevB.109.184112

High-field magnetoelectric coupling and successive magnetic transitions in Mn-doped polar antiferromagnet Ni3TeO6

Authors: J. H. Zhang, L. Lin, C. Dong, Y. T. Chang, J. F. Wang, C. L. Lu, P. Z. Chen, W. J. Zhai, G. Z. Zhou, L. Huang, Y. S. Tang, S. H. Zheng, M. F. Liu, X. H. Zhou, Z. B. Yan, J. -M. Liu

Abstract: Among the 3d transition metal ions doped polar Ni3TeO6, Mn-doped Ni3TeO6 has stimulated great interest due to its high magnetic ordering temperature and complex magnetic phases, but the mechanism of magnetoelectric (ME) coupling is far from understood. Herein we report our systematic investigation of the chemical control of magnetism, metamagnetic transition, and ME properties of Ni3-xMnxTeO6 sing… ▽ More Among the 3d transition metal ions doped polar Ni3TeO6, Mn-doped Ni3TeO6 has stimulated great interest due to its high magnetic ordering temperature and complex magnetic phases, but the mechanism of magnetoelectric (ME) coupling is far from understood. Herein we report our systematic investigation of the chemical control of magnetism, metamagnetic transition, and ME properties of Ni3-xMnxTeO6 single crystals in high magnetic field (H) up to 52 T. We present a previously unreported weak ferromagnetic behavior appeared in the ab plane below 9.5 K in addition to the incommensurate helical and commensurate collinear antiferromagnetic states. In the low-field region, a spin-flop type metamagnetic transition without any hysteresis occurs at Hc1 for H // c, while another metamagnetic transition accompanied with a change in electric polarization is observed at Hc2 in the high-field region both for H // c and H // ab above 30 K, which can be attributed to the sudden rotation of magnetic moments at Ni2 sites. The ME measurements reveal that a first-order ME effect is observed in the low-T and low-H regions, while a second-order ME coupling term appears above 30 K in the magnetic field range of Hc1 < H < Hc2 for H // c and H < Hc2 for H // ab, both becoming significant with increasing temperature. Eventually, they are dominated by the second-order ME effect near the antiferromagnetic transition temperature. The present work demonstrates that Ni3-xMnxTeO6 is an exotic magnetoelectric material compared with Ni3TeO6 and its derivatives, thereby providing insights to better understand the magnetism and ME coupling in Ni3TeO6 and its derivatives. △ Less

Submitted 29 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 30 pages with 8 figures

Journal ref: Phys. Rev. B 109, 184112 (2024)

arXiv:2405.15208 [pdf, other]

Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs

Authors: Chenxi Sun, Hongzhi Zhang, Zijia Lin, **gyuan Zhang, Fuzheng Zhang, Zhongyuan Wang, Bin Chen, Chengru Song, Di Zhang, Kun Gai, Deyi Xiong

Abstract: Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for real-time applications. This paper introduces Lexical Unit Decoding (LUD), a novel decoding methodology implemented in a data-driven manner, accelerating the d… ▽ More Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for real-time applications. This paper introduces Lexical Unit Decoding (LUD), a novel decoding methodology implemented in a data-driven manner, accelerating the decoding process without sacrificing output quality. The core of our approach is the observation that a pre-trained language model can confidently predict multiple contiguous tokens, forming the basis for a \textit{lexical unit}, in which these contiguous tokens could be decoded in parallel. Extensive experiments validate that our method substantially reduces decoding time while maintaining generation quality, i.e., 33\% speed up on natural language generation with no quality loss, and 30\% speed up on code generation with a negligible quality loss of 3\%. Distinctively, LUD requires no auxiliary models and does not require changes to existing architectures. It can also be integrated with other decoding acceleration methods, thus achieving an even more pronounced inference efficiency boost. We posit that the foundational principles of LUD could define a new decoding paradigm for future language models, enhancing their applicability for a broader spectrum of applications. All codes are be publicly available at https://github.com/tjunlp-lab/Lexical-Unit-Decoding-LUD-. Keywords: Parallel Decoding, Lexical Unit Decoding, Large Language Model △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted for publication at LREC-COLING 2024

arXiv:2405.15153 [pdf, other]

Optimal Reference Nodes Deployment for Positioning Seafloor Anchor Nodes

Authors: Wei Huang, Pengfei Wu, Tianhe Xu, Hao Zhang, Kaitao Meng

Abstract: Seafloor anchor nodes, which form a geodetic network, are designed to provide surface and underwater users with positioning, navigation and timing (PNT) services. Due to the non-uniform distribution of underwater sound speed, accurate positioning of underwater anchor nodes is a challenge work. Traditional anchor node positioning typically uses cross or circular shapes, however, how to optimize the… ▽ More Seafloor anchor nodes, which form a geodetic network, are designed to provide surface and underwater users with positioning, navigation and timing (PNT) services. Due to the non-uniform distribution of underwater sound speed, accurate positioning of underwater anchor nodes is a challenge work. Traditional anchor node positioning typically uses cross or circular shapes, however, how to optimize the deployment of reference nodes for positioning underwater anchor nodes considering the variability of sound speed has not yet been studied. This paper focuses on the optimal reference nodes deployment strategies for time--of--arrival (TOA) localization in the three-dimensional (3D) underwater space. We adopt the criterion that minimizing the trace of the inverse Fisher information matrix (FIM) to determine optimal reference nodes deployment with Gaussian measurement noise, which is positive related to the signal propagation path. A comprehensive analysis of optimal reference-target geometries is provided in the general circumstance with no restriction on the number of reference nodes, elevation angle and reference-target range. A new semi-closed form solution is found to detemine the optimal geometries. To demonstrate the findings in this paper, we conducted both simulations and sea trials on underwater anchor node positioning. Both the simulation and experiment results are consistent with theoretical analysis. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.15130 [pdf, other]

OptLLM: Optimal Assignment of Queries to Large Language Models

Authors: Yueyue Liu, Hongyu Zhang, Yuantian Miao, Van-Hoang Le, Zhiqiang Li

Abstract: Large Language Models (LLMs) have garnered considerable attention owing to their remarkable capabilities, leading to an increasing number of companies offering LLMs as services. Different LLMs achieve different performance at different costs. A challenge for users lies in choosing the LLMs that best fit their needs, balancing cost and performance. In this paper, we propose a framework for addressi… ▽ More Large Language Models (LLMs) have garnered considerable attention owing to their remarkable capabilities, leading to an increasing number of companies offering LLMs as services. Different LLMs achieve different performance at different costs. A challenge for users lies in choosing the LLMs that best fit their needs, balancing cost and performance. In this paper, we propose a framework for addressing the cost-effective query allocation problem for LLMs. Given a set of input queries and candidate LLMs, our framework, named OptLLM, provides users with a range of optimal solutions to choose from, aligning with their budget constraints and performance preferences, including options for maximizing accuracy and minimizing cost. OptLLM predicts the performance of candidate LLMs on each query using a multi-label classification model with uncertainty estimation and then iteratively generates a set of non-dominated solutions by destructing and reconstructing the current solution. To evaluate the effectiveness of OptLLM, we conduct extensive experiments on various types of tasks, including text classification, question answering, sentiment analysis, reasoning, and log parsing. Our experimental results demonstrate that OptLLM substantially reduces costs by 2.40% to 49.18% while achieving the same accuracy as the best LLM. Compared to other multi-objective optimization algorithms, OptLLM improves accuracy by 2.94% to 69.05% at the same cost or saves costs by 8.79% and 95.87% while maintaining the highest attainable accuracy. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: This paper is accepted by ICWS 2024

arXiv:2405.14893 [pdf, other]

YUI: Day-ahead Electricity Price Forecasting Using Invariance Simplified Supply and Demand Curve

Authors: Linian Wang, Anlan Yu, Jianghong Liu, Huibing Zhang, Leye Wang

Abstract: In day-ahead electricity market, it is crucial for all market participants to have access to reliable and accurate price forecasts for their decision-making processes. Forecasting methods currently utilized in industrial applications frequently neglect the underlying mechanisms of price formation, while economic research from the perspective of supply and demand have stringent data collection requ… ▽ More In day-ahead electricity market, it is crucial for all market participants to have access to reliable and accurate price forecasts for their decision-making processes. Forecasting methods currently utilized in industrial applications frequently neglect the underlying mechanisms of price formation, while economic research from the perspective of supply and demand have stringent data collection requirements, making it difficult to apply in actual markets. Observing the characteristics of the day-ahead electricity market, we introduce two invariance assumptions to simplify the modeling of supply and demand curves. Upon incorporating the time invariance assumption, we can forecast the supply curve using the market equilibrium points from multiple time slots in the recent period. By introducing the price insensitivity assumption, we can approximate the demand curve using a straight line. The point where these two curves intersect provides us with the forecast price. The proposed model, forecasting suppl\textbf{Y} and demand cUrve simplified by Invariance, termed as YUI, is more efficient than state-of-the-art methods. Our experiment results in Shanxi day-ahead electricity market show that compared with existing methods, YUI can reduce forecast error by 13.8\% in MAE and 28.7\% in sMAPE. Code is publicly available at https://github.com/wangln19/YUI. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 13 pages

arXiv:2405.14866 [pdf, other]

Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras

Authors: Hanzhang Tu, Ruizhi Shao, Xue Dong, Shunyuan Zheng, Hao Zhang, Lili Chen, Meili Wang, Wenyu Li, Siyan Ma, Sheng** Zhang, Boyao Zhou, Yebin Liu

Abstract: In this paper, we present a low-budget and high-authenticity bidirectional telepresence system, Tele-Aloha, targeting peer-to-peer communication scenarios. Compared to previous systems, Tele-Aloha utilizes only four sparse RGB cameras, one consumer-grade GPU, and one autostereoscopic screen to achieve high-resolution (2048x2048), real-time (30 fps), low-latency (less than 150ms) and robust distant… ▽ More In this paper, we present a low-budget and high-authenticity bidirectional telepresence system, Tele-Aloha, targeting peer-to-peer communication scenarios. Compared to previous systems, Tele-Aloha utilizes only four sparse RGB cameras, one consumer-grade GPU, and one autostereoscopic screen to achieve high-resolution (2048x2048), real-time (30 fps), low-latency (less than 150ms) and robust distant communication. As the core of Tele-Aloha, we propose an efficient novel view synthesis algorithm for upper-body. Firstly, we design a cascaded disparity estimator for obtaining a robust geometry cue. Additionally a neural rasterizer via Gaussian Splatting is introduced to project latent features onto target view and to decode them into a reduced resolution. Further, given the high-quality captured data, we leverage weighted blending mechanism to refine the decoded image into the final resolution of 2K. Exploiting world-leading autostereoscopic display and low-latency iris tracking, users are able to experience a strong three-dimensional sense even without any wearable head-mounted display device. Altogether, our telepresence system demonstrates the sense of co-presence in real-life experiments, inspiring the next generation of communication. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Paper accepted by SIGGRAPH 2024. Project page: http://118.178.32.38/c/Tele-Aloha/

arXiv:2405.14751 [pdf, other]

AGILE: A Novel Framework of LLM Agents

Authors: Peiyuan Feng, Yichen He, Guanhua Huang, Yuan Lin, Hanchong Zhang, Yuchen Zhang, Hang Li

Abstract: We introduce a novel framework of LLM agents named AGILE (AGent that Interacts and Learns from Environments) designed to perform complex conversational tasks with users, leveraging LLMs, memory, tools, and interactions with experts. The agent's abilities include not only conversation but also reflection, utilization of tools, and consultation with experts. We formulate the construction of such an… ▽ More We introduce a novel framework of LLM agents named AGILE (AGent that Interacts and Learns from Environments) designed to perform complex conversational tasks with users, leveraging LLMs, memory, tools, and interactions with experts. The agent's abilities include not only conversation but also reflection, utilization of tools, and consultation with experts. We formulate the construction of such an LLM agent as a reinforcement learning problem, in which the LLM serves as the policy model. We fine-tune the LLM using labeled data of actions and the PPO algorithm. We focus on question answering and release a dataset for agents called ProductQA, comprising challenging questions in online shop**. Our extensive experiments on ProductQA and MedMCQA show that AGILE agents based on 13B and 7B LLMs trained with PPO can outperform GPT-4 agents. Our ablation study highlights the indispensability of memory, tools, consultation, reflection, and reinforcement learning in achieving the agent's strong performance. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14672 [pdf, other]

Towards Imperceptible Backdoor Attack in Self-supervised Learning

Authors: Hanrong Zhang, Zhenting Wang, Tingxu Han, Mingyu **, Chenlu Zhan, Mengnan Du, Hongwei Wang, Shiqing Ma

Abstract: Self-supervised learning models are vulnerable to backdoor attacks. Existing backdoor attacks that are effective in self-supervised learning often involve noticeable triggers, like colored patches, which are vulnerable to human inspection. In this paper, we propose an imperceptible and effective backdoor attack against self-supervised models. We first find that existing imperceptible triggers desi… ▽ More Self-supervised learning models are vulnerable to backdoor attacks. Existing backdoor attacks that are effective in self-supervised learning often involve noticeable triggers, like colored patches, which are vulnerable to human inspection. In this paper, we propose an imperceptible and effective backdoor attack against self-supervised models. We first find that existing imperceptible triggers designed for supervised learning are not as effective in compromising self-supervised models. We then identify this ineffectiveness is attributed to the overlap in distributions between the backdoor and augmented samples used in self-supervised learning. Building on this insight, we design an attack using optimized triggers that are disentangled to the augmented transformation in the self-supervised learning, while also remaining imperceptible to human vision. Experiments on five datasets and seven SSL algorithms demonstrate our attack is highly effective and stealthy. It also has strong resistance to existing backdoor defenses. Our code can be found at https://github.com/Zhang-Henry/IMPERATIVE. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14578 [pdf, other]

Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling

Authors: Shuaipeng Li, Penghao Zhao, Hailin Zhang, Xingwu Sun, Hao Wu, Dian Jiao, Weiyan Wang, Chengjun Liu, Zheng Fang, **bao Xue, Yangyu Tao, Bin Cui, Di Wang

Abstract: In current deep learning tasks, Adam style optimizers such as Adam, Adagrad, RMSProp, Adafactor, and Lion have been widely used as alternatives to SGD style optimizers. These optimizers typically update model parameters using the sign of gradients, resulting in more stable convergence curves. The learning rate and the batch size are the most critical hyperparameters for optimizers, which require c… ▽ More In current deep learning tasks, Adam style optimizers such as Adam, Adagrad, RMSProp, Adafactor, and Lion have been widely used as alternatives to SGD style optimizers. These optimizers typically update model parameters using the sign of gradients, resulting in more stable convergence curves. The learning rate and the batch size are the most critical hyperparameters for optimizers, which require careful tuning to enable effective convergence. Previous research has shown that the optimal learning rate increases linearly or follows similar rules with batch size for SGD style optimizers. However, this conclusion is not applicable to Adam style optimizers. In this paper, we elucidate the connection between optimal learning rates and batch sizes for Adam style optimizers through both theoretical analysis and extensive experiments. First, we raise the scaling law between batch sizes and optimal learning rates in the sign of gradient case, in which we prove that the optimal learning rate first rises and then falls as the batch size increases. Moreover, the peak value of the surge will gradually move toward the larger batch size as training progresses. Second, we conducted experiments on various CV and NLP tasks and verified the correctness of the scaling law. △ Less

Submitted 4 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14545 [pdf, other]

A Cross-Field Fusion Strategy for Drug-Target Interaction Prediction

Authors: Hongzhi Zhang, Xiuwen Gong, Shirui Pan, Jia Wu, Bo Du, Wenbin Hu

Abstract: Drug-target interaction (DTI) prediction is a critical component of the drug discovery process. In the drug development engineering field, predicting novel drug-target interactions is extremely crucial.However, although existing methods have achieved high accuracy levels in predicting known drugs and drug targets, they fail to utilize global protein information during DTI prediction. This leads to… ▽ More Drug-target interaction (DTI) prediction is a critical component of the drug discovery process. In the drug development engineering field, predicting novel drug-target interactions is extremely crucial.However, although existing methods have achieved high accuracy levels in predicting known drugs and drug targets, they fail to utilize global protein information during DTI prediction. This leads to an inability to effectively predict interaction the interactions between novel drugs and their targets. As a result, the cross-field information fusion strategy is employed to acquire local and global protein information. Thus, we propose the siamese drug-target interaction SiamDTI prediction method, which utilizes a double channel network structure for cross-field supervised learning.Experimental results on three benchmark datasets demonstrate that SiamDTI achieves higher accuracy levels than other state-of-the-art (SOTA) methods on novel drugs and targets.Additionally, SiamDTI's performance with known drugs and targets is comparable to that of SOTA approachs. The code is available at https://anonymous.4open.science/r/DDDTI-434D. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14541 [pdf, other]

doi 10.1038/s42005-024-01671-0

Concurrence of directional Kondo transport and incommensurate magnetic order in the layered material AgCrSe$_2$

Authors: José Guimarães, Dorsa S. Fartab, Michal Moravec, Marcus Schmidt, Michael Baenitz, Burkhard Schmidt, Hai**g Zhang

Abstract: In this work, we report on the concurrent emergence of the directional Kondo behavior and incommensurate magnetic ordering in a layered material. We employ temperature- and magnetic field-dependent resistivity measurements, susceptibility measurements, and high resolution wavelength X-ray diffraction spectroscopy to study the electronic properties of AgCrSe$_2$. Impurity Kondo behavior with a char… ▽ More In this work, we report on the concurrent emergence of the directional Kondo behavior and incommensurate magnetic ordering in a layered material. We employ temperature- and magnetic field-dependent resistivity measurements, susceptibility measurements, and high resolution wavelength X-ray diffraction spectroscopy to study the electronic properties of AgCrSe$_2$. Impurity Kondo behavior with a characteristic temperature of $T_\text K$ = 32 K is identified through quantitative analysis of the in-plane resistivity, substantiated by magneto-transport measurements. The agreement between our experimental data and the Schlottmann's scaling theory allows us to determine the impurity spin as $S$ = 3/2. Furthermore, we discuss the origin of the Kondo behavior and its relation to the material's antiferromagnetic transition. Our study uncovers an unusual phenomenon -- the equivalence of the Néel temperature and the Kondo temperature -- paving the way for further investigations into the intricate interplay between impurity physics and magnetic phenomena in quantum materials, with potential applications in advanced electronic and magnetic devices. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 11 pages, 4 figures

Journal ref: Commun Phys 7, 176 (2024)

arXiv:2405.14480 [pdf, other]

Scalable Visual State Space Model with Fractal Scanning

Authors: Lv Tang, HaoKe Xiao, Peng-Tao Jiang, Hao Zhang, **wei Chen, Bo Li

Abstract: Foundational models have significantly advanced in natural language processing (NLP) and computer vision (CV), with the Transformer architecture becoming a standard backbone. However, the Transformer's quadratic complexity poses challenges for handling longer sequences and higher resolution images. To address this challenge, State Space Models (SSMs) like Mamba have emerged as efficient alternativ… ▽ More Foundational models have significantly advanced in natural language processing (NLP) and computer vision (CV), with the Transformer architecture becoming a standard backbone. However, the Transformer's quadratic complexity poses challenges for handling longer sequences and higher resolution images. To address this challenge, State Space Models (SSMs) like Mamba have emerged as efficient alternatives, initially matching Transformer performance in NLP tasks and later surpassing Vision Transformers (ViTs) in various CV tasks. To improve the performance of SSMs, one crucial aspect is effective serialization of image patches. Existing methods, relying on linear scanning curves, often fail to capture complex spatial relationships and produce repetitive patterns, leading to biases. To address these limitations, we propose using fractal scanning curves for patch serialization. Fractal curves maintain high spatial proximity and adapt to different image resolutions, avoiding redundancy and enhancing SSMs' ability to model complex patterns accurately. We validate our method in image classification, detection, and segmentation tasks, and the superior performance validates its effectiveness. △ Less

Submitted 26 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: This paper is working in progress

arXiv:2405.14407 [pdf, other]

Gradient Transformation: Towards Efficient and Model-Agnostic Unlearning for Dynamic Graph Neural Networks

Authors: He Zhang, Bang Wu, Xiangwen Yang, Xingliang Yuan, Chengqi Zhang, Shirui Pan

Abstract: Graph unlearning has emerged as an essential tool for safeguarding user privacy and mitigating the negative impacts of undesirable data. Meanwhile, the advent of dynamic graph neural networks (DGNNs) marks a significant advancement due to their superior capability in learning from dynamic graphs, which encapsulate spatial-temporal variations in diverse real-world applications (e.g., traffic foreca… ▽ More Graph unlearning has emerged as an essential tool for safeguarding user privacy and mitigating the negative impacts of undesirable data. Meanwhile, the advent of dynamic graph neural networks (DGNNs) marks a significant advancement due to their superior capability in learning from dynamic graphs, which encapsulate spatial-temporal variations in diverse real-world applications (e.g., traffic forecasting). With the increasing prevalence of DGNNs, it becomes imperative to investigate the implementation of dynamic graph unlearning. However, current graph unlearning methodologies are designed for GNNs operating on static graphs and exhibit limitations including their serving in a pre-processing manner and impractical resource demands. Furthermore, the adaptation of these methods to DGNNs presents non-trivial challenges, owing to the distinctive nature of dynamic graphs. To this end, we propose an effective, efficient, model-agnostic, and post-processing method to implement DGNN unlearning. Specifically, we first define the unlearning requests and formulate dynamic graph unlearning in the context of continuous-time dynamic graphs. After conducting a role analysis on the unlearning data, the remaining data, and the target DGNN model, we propose a method called Gradient Transformation and a loss function to map the unlearning request to the desired parameter update. Evaluations on six real-world datasets and state-of-the-art DGNN backbones demonstrate its effectiveness (e.g., limited performance drop even obvious improvement) and efficiency (e.g., at most 7.23$\times$ speed-up) outperformance, and potential advantages in handling future unlearning requests (e.g., at most 32.59$\times$ speed-up). △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14358 [pdf, other]

AI-Olympics: Exploring the Generalization of Agents through Open Competitions

Authors: Chen Wang, Yan Song, Shuai Wu, Sa Wu, Ruizhi Zhang, Shu Lin, Haifeng Zhang

Abstract: Between 2021 and 2023, AI-Olympics, a series of online AI competitions was hosted by the online evaluation platform Jidi in collaboration with the IJCAI committee. In these competitions, an agent is required to accomplish diverse sports tasks in a two-dimensional continuous world, while competing against an opponent. This paper provides a brief overview of the competition series and highlights not… ▽ More Between 2021 and 2023, AI-Olympics, a series of online AI competitions was hosted by the online evaluation platform Jidi in collaboration with the IJCAI committee. In these competitions, an agent is required to accomplish diverse sports tasks in a two-dimensional continuous world, while competing against an opponent. This paper provides a brief overview of the competition series and highlights notable findings. We aim to contribute insights to the field of multi-agent decision-making and explore the generalization of agents through engineering efforts. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: IJCAI 2024 Demo Track Paper

arXiv:2405.14210 [pdf, other]

Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds

Authors: Hanwei Zhang, Luo Cheng, Qisong He, Wei Huang, Renjue Li, Ronan Sicre, Xiaowei Huang, Holger Hermanns, Lijun Zhang

Abstract: Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect tha… ▽ More Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect that a seemingly well-trained model ends up misclassifying the input. This paper adds to the understanding of adversarial attacks by presenting Eidos, a framework providing Efficient Imperceptible aDversarial attacks on 3D pOint cloudS. Eidos supports a diverse set of imperceptibility metrics. It employs an iterative, two-step procedure to identify optimal adversarial examples, thereby enabling a runtime-imperceptibility trade-off. We provide empirical evidence relative to several popular 3D point cloud classification models and several established 3D attack methods, showing Eidos' superiority with respect to efficiency as well as imperceptibility. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Preprint

arXiv:2405.14017 [pdf, other]

MagicPose4D: Crafting Articulated Models with Appearance and Motion Control

Authors: Hao Zhang, Di Chang, Fang Li, Mohammad Soleymani, Narendra Ahuja

Abstract: With the success of 2D and 3D visual generative models, there is growing interest in generating 4D content. Existing methods primarily rely on text prompts to produce 4D content, but they often fall short of accurately defining complex or rare motions. To address this limitation, we propose MagicPose4D, a novel framework for refined control over both appearance and motion in 4D generation. Unlike… ▽ More With the success of 2D and 3D visual generative models, there is growing interest in generating 4D content. Existing methods primarily rely on text prompts to produce 4D content, but they often fall short of accurately defining complex or rare motions. To address this limitation, we propose MagicPose4D, a novel framework for refined control over both appearance and motion in 4D generation. Unlike traditional methods, MagicPose4D accepts monocular videos as motion prompts, enabling precise and customizable motion generation. MagicPose4D comprises two key modules: i) Dual-Phase 4D Reconstruction Module} which operates in two phases. The first phase focuses on capturing the model's shape using accurate 2D supervision and less accurate but geometrically informative 3D pseudo-supervision without imposing skeleton constraints. The second phase refines the model using more accurate pseudo-3D supervision, obtained in the first phase and introduces kinematic chain-based skeleton constraints to ensure physical plausibility. Additionally, we propose a Global-local Chamfer loss that aligns the overall distribution of predicted mesh vertices with the supervision while maintaining part-level alignment without extra annotations. ii) Cross-category Motion Transfer Module} leverages the predictions from the 4D reconstruction module and uses a kinematic-chain-based skeleton to achieve cross-category motion transfer. It ensures smooth transitions between frames through dynamic rigidity, facilitating robust generalization without additional training. Through extensive experiments, we demonstrate that MagicPose4D significantly improves the accuracy and consistency of 4D content generation, outperforming existing methods in various benchmarks. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Project Page: https://boese0601.github.io/magicpose4d

arXiv:2405.13792 [pdf, other]

xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

Authors: Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao

Abstract: This paper introduces xRAG, an innovative context compression method tailored for retrieval-augmented generation. xRAG reinterprets document embeddings in dense retrieval--traditionally used solely for retrieval--as features from the retrieval modality. By employing a modality fusion methodology, xRAG seamlessly integrates these embeddings into the language model representation space, effectively… ▽ More This paper introduces xRAG, an innovative context compression method tailored for retrieval-augmented generation. xRAG reinterprets document embeddings in dense retrieval--traditionally used solely for retrieval--as features from the retrieval modality. By employing a modality fusion methodology, xRAG seamlessly integrates these embeddings into the language model representation space, effectively eliminating the need for their textual counterparts and achieving an extreme compression rate. In xRAG, the only trainable component is the modality bridge, while both the retriever and the language model remain frozen. This design choice allows for the reuse of offline-constructed document embeddings and preserves the plug-and-play nature of retrieval augmentation. Experimental results demonstrate that xRAG achieves an average improvement of over 10% across six knowledge-intensive tasks, adaptable to various language model backbones, ranging from a dense 7B model to an 8x7B Mixture of Experts configuration. xRAG not only significantly outperforms previous context compression methods but also matches the performance of uncompressed models on several datasets, while reducing overall FLOPs by a factor of 3.53. Our work pioneers new directions in retrieval-augmented generation from the perspective of multimodality fusion, and we hope it lays the foundation for future efficient and scalable retrieval-augmented systems △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13661 [pdf, ps, other]

Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review

Authors: Hong Zhang, Jie Lin, Shengxuan Chen

Abstract: Timbre, the sound's unique "color", is fundamental to how we perceive and appreciate music. This review explores the multifaceted world of timbre perception and representation. It begins by tracing the word's origin, offering an intuitive grasp of the concept. Building upon this foundation, the article delves into the complexities of defining and measuring timbre. It then explores the concept and… ▽ More Timbre, the sound's unique "color", is fundamental to how we perceive and appreciate music. This review explores the multifaceted world of timbre perception and representation. It begins by tracing the word's origin, offering an intuitive grasp of the concept. Building upon this foundation, the article delves into the complexities of defining and measuring timbre. It then explores the concept and techniques of timbre space, a powerful tool for visualizing how we perceive different timbres. The review further examines recent advancements in timbre manipulation and representation, including the increasingly utilized machine learning techniques. While the underlying neural mechanisms remain partially understood, the article discusses current neuroimaging techniques used to investigate this aspect of perception. Finally, it summarizes key takeaways, identifies promising future research directions, and emphasizes the potential applications of timbre research in music technology, assistive technologies, and our overall understanding of auditory perception. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13529 [pdf]

The correlation between nativelike selection and prototypicality: a multilingual onomasiological case study using semantic embedding

Authors: Huasheng Zhang

Abstract: In native speakers' lexical choices, a concept can be more readily expressed by one expression over another grammatical one, a phenomenon known as nativelike selection (NLS). In previous research, arbitrary chunks such as collocations have been considered crucial for this phenomenon. However, this study examines the possibility of analyzing the semantic motivation and deducibility behind some NLSs… ▽ More In native speakers' lexical choices, a concept can be more readily expressed by one expression over another grammatical one, a phenomenon known as nativelike selection (NLS). In previous research, arbitrary chunks such as collocations have been considered crucial for this phenomenon. However, this study examines the possibility of analyzing the semantic motivation and deducibility behind some NLSs by exploring the correlation between NLS and prototypicality, specifically the onomasiological hypothesis of Grondelaers and Geeraerts (2003, Towards a pragmatic model of cognitive onomasiology. In Hubert Cuyckens, René Dirven & John R. Taylor (eds.), Cognitive approaches to lexical semantics, 67-92. Berlin: De Gruyter Mouton). They hypothesized that "[a] referent is more readily named by a lexical item if it is a salient member of the category denoted by that item". To provide a preliminary investigation of this important but rarely explored phenomenon, a series of innovative methods and procedures, including the use of semantic embedding and interlingual comparisons, is designed. Specifically, potential NLSs are efficiently discovered through an automatic exploratory analysis using topic modeling techniques, and then confirmed by manual inspection through frame semantics. Finally, to account for the NLS in question, cluster analysis and behavioral profile analysis are conducted to uncover a language-specific prototype for the Chinese verb shang 'harm', providing supporting evidence for the correlation between NLS and prototypicality. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13476 [pdf, other]

Restricting Voltage Deviation of DC Microgrids with Critical and Ordinary Nodes

Authors: Handong Bai, Peng Li, Hongwei Zhang

Abstract: Restricting bus voltage deviation is crucial for normal operation of multi-bus DC microgrids, yet it has received insufficient attention due to the conflict between two main control objectives in DC microgrids, i.e., voltage regulation and current sharing. By revealing a necessary and sufficient condition for achieving these two objectives, this paper proposes a compromised distributed control alg… ▽ More Restricting bus voltage deviation is crucial for normal operation of multi-bus DC microgrids, yet it has received insufficient attention due to the conflict between two main control objectives in DC microgrids, i.e., voltage regulation and current sharing. By revealing a necessary and sufficient condition for achieving these two objectives, this paper proposes a compromised distributed control algorithm, which regulates the voltage deviation of all buses by relaxing the accuracy of current sharing. Moreover, for a class of DC Microgrids consisting of both critical nodes and ordinary nodes, this paper proposes a distributed control algorithm that restricts the voltage deviation of critical nodes and simultaneously keeps the current sharing of ordinary nodes. This algorithm also works under plug-and-play settings. Simulations illustrate our theory. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13409 [pdf, other]

Specular Polynomials

Authors: Zhimin Fan, Jie Guo, Yiming Wang, Tianyu Xiao, Hao Zhang, Chenxi Zhou, Zhenyu Chen, Pengpei Hong, Yanwen Guo, Ling-Qi Yan

Abstract: Finding valid light paths that involve specular vertices in Monte Carlo rendering requires solving many non-linear, transcendental equations in high-dimensional space. Existing approaches heavily rely on Newton iterations in path space, which are limited to obtaining at most a single solution each time and easily diverge when initialized with improper seeds. We propose specular polynomials, a Ne… ▽ More Finding valid light paths that involve specular vertices in Monte Carlo rendering requires solving many non-linear, transcendental equations in high-dimensional space. Existing approaches heavily rely on Newton iterations in path space, which are limited to obtaining at most a single solution each time and easily diverge when initialized with improper seeds. We propose specular polynomials, a Newton iteration-free methodology for finding a complete set of admissible specular paths connecting two arbitrary endpoints in a scene. The core is a reformulation of specular constraints into polynomial systems, which makes it possible to reduce the task to a univariate root-finding problem. We first derive bivariate systems utilizing rational coordinate map** between the coordinates of consecutive vertices. Subsequently, we adopt the hidden variable resultant method for variable elimination, converting the problem into finding zeros of the determinant of univariate matrix polynomials. This can be effectively solved through Laplacian expansion for one bounce and a bisection solver for more bounces. Our solution is generic, completely deterministic, accurate for the case of one bounce, and GPU-friendly. We develop efficient CPU and GPU implementations and apply them to challenging glints and caustic rendering. Experiments on various scenarios demonstrate the superiority of specular polynomial-based solutions compared to Newton iteration-based counterparts. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 13 pages, 13 figures, accepted by SIGGRAPH 2024

ACM Class: I.3.3

arXiv:2405.13315 [pdf, other]

Study of the decays $χ_{cJ}\toΛ\barΛω$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, we present the first observation of the decays $χ_{cJ}\toΛ\barΛω$, where $J=0, 1, 2$, with statistical significances of $11.7 σ, 11.2 σ$, and $11.8 σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\toΛ\barΛω)=({2.37 \pm 0.22 \pm 0.23}) \times 10^{-4}$,… ▽ More Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, we present the first observation of the decays $χ_{cJ}\toΛ\barΛω$, where $J=0, 1, 2$, with statistical significances of $11.7 σ, 11.2 σ$, and $11.8 σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\toΛ\barΛω)=({2.37 \pm 0.22 \pm 0.23}) \times 10^{-4}$, $\mathcal{B}(χ_{c1}\toΛ\barΛω)=({1.01 \pm 0.10 \pm 0.11}) \times 10^{-4}$, and $\mathcal{B}(χ_{c2}\toΛ\barΛω)=({1.40 \pm 0.13 \pm 0.17}) \times 10^{-4}$, where the first uncertainties are statistical and the second are systematic. We observe no clear intermediate structures. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 11 pages, 10 figures

arXiv:2405.13289 [pdf, other]

AUGlasses: Continuous Action Unit based Facial Reconstruction with Low-power IMUs on Smart Glasses

Authors: Yanrong Li, Tengxiang Zhang, Xin Zeng, Yuntao Wang, Haotian Zhang, Yiqiang Chen

Abstract: Recent advancements in augmented reality (AR) have enabled the use of various sensors on smart glasses for applications like facial reconstruction, which is vital to improve AR experiences for virtual social activities. However, the size and power constraints of smart glasses demand a miniature and low-power sensing solution. AUGlasses achieves unobtrusive low-power facial reconstruction by placin… ▽ More Recent advancements in augmented reality (AR) have enabled the use of various sensors on smart glasses for applications like facial reconstruction, which is vital to improve AR experiences for virtual social activities. However, the size and power constraints of smart glasses demand a miniature and low-power sensing solution. AUGlasses achieves unobtrusive low-power facial reconstruction by placing inertial measurement units (IMU) against the temporal area on the face to capture the skin deformations, which are caused by facial muscle movements. These IMU signals, along with historical data on facial action units (AUs), are processed by a transformer-based deep learning model to estimate AU intensities in real-time, which are then used for facial reconstruction. Our results show that AUGlasses accurately predicts the strength (0-5 scale) of 14 key AUs with a cross-user mean absolute error (MAE) of 0.187 (STD = 0.025) and achieves facial reconstruction with a cross-user MAE of 1.93 mm (STD = 0.353). We also integrated various preprocessing and training techniques to ensure robust performance for continuous sensing. Micro-benchmark tests indicate that our system consistently performs accurate continuous facial reconstruction with a fine-tuned cross-user model, achieving an AU MAE of 0.35. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.13243 [pdf]

A Novel Approach to Evaluating Battery Charger Controller Design with Nonlinear PID Controller in an Extendable CHIL Setup

Authors: Shervin Salehi Rad, Micheal Muhlbaier, Oleg Fishman, Javad Chevinly, Elias Nadi, Hua Zhang, Fei Lu

Abstract: The design and development of power electronics converters pose a multitude of challenges. The evaluation of power electronics converters, particularly when operating at high power levels, presents a significant task, offering designers a deeper understanding of the functionality. Several methodologies have been devised to conduct hardware-in-the-loop (HIL) tests, which are classified into two mai… ▽ More The design and development of power electronics converters pose a multitude of challenges. The evaluation of power electronics converters, particularly when operating at high power levels, presents a significant task, offering designers a deeper understanding of the functionality. Several methodologies have been devised to conduct hardware-in-the-loop (HIL) tests, which are classified into two main categories: controller hardware-in-the-loop (CHIL) and power hardware-in-the-loop (PHIL) tests. This paper explores the advantages and drawbacks of these two approaches and introduces a straightforward and cost-effective CHIL method for initial proof of concept and risk-free controller training. This method is based on the interaction between MATLAB/Simulink and Python. To assess the operation of the proposed system, a modeled battery pack (96S1P) is charged using a modeled battery charger converter, employing a non-linear PID controller. In this scenario, the controller benefits from the CC-CV technique for charging the battery pack. The results are presented in the final section. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 2024 IEEE Transportation Electrification Conference & Expo (ITEC 2024) (5 pages, 11 figures)

arXiv:2405.13113 [pdf, other]

MAMMOTH-Subaru. II. Diverse Populations of Circumgalactic Ly$α$ Nebulae at Cosmic Noon

Authors: Mingyu Li, Haibin Zhang, Zheng Cai, Yongming Liang, Nobunari Kashikawa, Ke Ma, Xiaohui Fan, J. Xavier Prochaska, Bjorn H. C. Emonts, Xin Wang, Yun**g Wu, Shiwu Zhang, Qiong Li, Sean D. Johnson, Minghao Yue, Fabrizio Arrigoni Battaia, Sebastiano Cantalupo, Joseph F. Hennawi, Satoshi Kikuta, Yuanhang Ning, Masami Ouchi, Rhythm Shimakawa, Ben Wang, Weichen Wang, Zheng Zheng , et al. (1 additional authors not shown)

Abstract: Circumgalactic Lyman-alpha (Ly$α$) nebulae are gaseous halos around galaxies exhibiting luminous extended Ly$α$ emission. This work investigates Ly$α$ nebulae from deep imaging of $\sim12~\mathrm{deg}^2$ sky, targeted by the MAMMOTH-Subaru survey. Utilizing the wide-field capability of Hyper Suprime-Cam (HSC), we present one of the largest blind Ly$α$ nebula selections, including QSO nebulae, Ly… ▽ More Circumgalactic Lyman-alpha (Ly$α$) nebulae are gaseous halos around galaxies exhibiting luminous extended Ly$α$ emission. This work investigates Ly$α$ nebulae from deep imaging of $\sim12~\mathrm{deg}^2$ sky, targeted by the MAMMOTH-Subaru survey. Utilizing the wide-field capability of Hyper Suprime-Cam (HSC), we present one of the largest blind Ly$α$ nebula selections, including QSO nebulae, Ly$α$ blobs, and radio galaxy nebulae down to typical $2σ$ Ly$α$ surface brightness of $(5-10)\times10^{-18}\mathrm{~erg~s^{-1}~cm^{-2}~arcsec^{-2}}$. The sample contains 117 nebulae with Ly$α$ sizes of 40 - 400 kpc, and the most gigantic one spans about 365 kpc, referred to as the Ivory Nebula. Combining with multiwavelength data, we investigate diverse nebula populations and associated galaxies. We find a small fraction of Ly$α$ nebulae have QSOs ($\sim7\%$), luminous infrared galaxies ($\sim1\%$), and radio galaxies ($\sim 2\%$). Remarkably, among the 28 enormous Ly$α$ nebulae (ELANe) exceeding 100 kpc, about $80\%$ are associated with UV-faint galaxies ($M_\mathrm{UV} > -22$), categorized as Type II ELANe. We underscore that Type II ELANe constitute the majority but remain largely hidden in current galaxy and QSO surveys. Dusty starburst and obscured AGN activity are proposed to explain the nature of Type II ELANe. The SED of stacking all Ly$α$ nebulae also reveals signs of massive dusty star-forming galaxies with obscured AGNs. We propose a model to explain the dusty nature where the diverse populations of Ly$α$ nebula capture massive galaxies at different evolutionary stages undergoing violent assembling. Ly$α$ nebulae provide critical insights into the formation and evolution of today's massive cluster galaxies at cosmic noon. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 26 pages, 10 figures, 3 tables, submitted to ApJS, comments welcome

arXiv:2405.12809 [pdf, other]

Precision measurement of the branching fraction of \boldmath $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (604 additional authors not shown)

Abstract: Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$. The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with sig… ▽ More Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$. The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with significantly improved precision. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: to be submitted to PRD

arXiv:2405.12788 [pdf, other]

What Have We Achieved on Non-autoregressive Translation?

Authors: Yafu Li, Huajian Zhang, Jianhao Yan, Yong**g Yin, Yue Zhang

Abstract: Recent advances have made non-autoregressive (NAT) translation comparable to autoregressive methods (AT). However, their evaluation using BLEU has been shown to weakly correlate with human annotations. Limited research compares non-autoregressive translation and autoregressive translation comprehensively, leaving uncertainty about the true proximity of NAT to AT. To address this gap, we systematic… ▽ More Recent advances have made non-autoregressive (NAT) translation comparable to autoregressive methods (AT). However, their evaluation using BLEU has been shown to weakly correlate with human annotations. Limited research compares non-autoregressive translation and autoregressive translation comprehensively, leaving uncertainty about the true proximity of NAT to AT. To address this gap, we systematically evaluate four representative NAT methods across various dimensions, including human evaluation. Our empirical results demonstrate that despite narrowing the performance gap, state-of-the-art NAT still underperforms AT under more reliable evaluation metrics. Furthermore, we discover that explicitly modeling dependencies is crucial for generating natural language and generalizing to out-of-distribution sequences. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: ACL 2024 Findings

Showing 301–350 of 9,566 results for author: Zhang, H