-
Fine-Tuning Stable Diffusion XL for Stylistic Icon Generation: A Comparison of Caption Size
Authors:
Youssef Sultan,
Jiangqin Ma,
Yu-Ying Liao
Abstract:
In this paper, we show different fine-tuning methods for Stable Diffusion XL; this includes inference steps, and caption customization for each image to align with generating images in the style of a commercial 2D icon training set. We also show how important it is to properly define what "high-quality" really is especially for a commercial-use environment. As generative AI models continue to gain…
▽ More
In this paper, we show different fine-tuning methods for Stable Diffusion XL; this includes inference steps, and caption customization for each image to align with generating images in the style of a commercial 2D icon training set. We also show how important it is to properly define what "high-quality" really is especially for a commercial-use environment. As generative AI models continue to gain widespread acceptance and usage, there emerge many different ways to optimize and evaluate them for various applications. Specifically text-to-image models, such as Stable Diffusion XL and DALL-E 3 require distinct evaluation practices to effectively generate high-quality icons according to a specific style. Although some images that are generated based on a certain style may have a lower FID score (better), we show how this is not absolute in and of itself even for rasterized icons. While FID scores reflect the similarity of generated images to the overall training set, CLIP scores measure the alignment between generated images and their textual descriptions. We show how FID scores miss significant aspects, such as the minority of pixel differences that matter most in an icon, while CLIP scores result in misjudging the quality of icons. The CLIP model's understanding of "similarity" is shaped by its own training data; which does not account for feature variation in our style of choice. Our findings highlight the need for specialized evaluation metrics and fine-tuning approaches when generating high-quality commercial icons, potentially leading to more effective and tailored applications of text-to-image models in professional design contexts.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
DIDUP: Dynamic Iterative Development for UI Prototy**
Authors:
Jenny Ma,
Karthik Sreedhar,
Vivian Liu,
Sitong Wang,
Pedro Alejandro Perez,
Lydia B. Chilton
Abstract:
Large language models (LLMs) are remarkably good at writing code. A particularly valuable case of human-LLM collaboration is code-based UI prototy**, a method for creating interactive prototypes that allows users to view and fully engage with a user interface. We conduct a formative study of GPT Pilot, a leading LLM-generated code-prototy** system, and find that its inflexibility towards chang…
▽ More
Large language models (LLMs) are remarkably good at writing code. A particularly valuable case of human-LLM collaboration is code-based UI prototy**, a method for creating interactive prototypes that allows users to view and fully engage with a user interface. We conduct a formative study of GPT Pilot, a leading LLM-generated code-prototy** system, and find that its inflexibility towards change once development has started leads to weaknesses in failure prevention and dynamic planning; it closely resembles the linear workflow of the waterfall model. We introduce DIDUP, a system for code-based UI prototy** that follows an iterative spiral model, which takes changes and iterations that come up during the development process into account. We propose three novel mechanisms for LLM-generated code-prototy** systems: (1) adaptive planning, where plans should be dynamic and reflect changes during implementation, (2) code injection, where the system should write a minimal amount of code and inject it instead of rewriting code so users have a better mental model of the code evolution, and (3) lightweight state management, a simplified version of source control so users can quickly revert to different working states. Together, this enables users to rapidly develop and iterate on prototypes.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning
Authors:
Runmin Zhang,
Jun Ma,
Si-Yuan Cao,
Lun Luo,
Beinan Yu,
Shu-Jie Chen,
Junwei Li,
Hui-Liang Shen
Abstract:
We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent fe…
▽ More
We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent feature map projection are combined to form the learnable architecture of SCPNet, boosting the unsupervised learning framework. SCPNet is the first to achieve effective unsupervised homography estimation on the satellite-map image pair cross-modal dataset, GoogleMap, under [-32,+32] offset on a 128x128 image, leading the supervised approach MHN by 14.0% of mean average corner error (MACE). We further conduct extensive experiments on several cross-modal/spectral and manually-made inconsistent datasets, on which SCPNet achieves the state-of-the-art (SOTA) performance among unsupervised approaches, and owns 49.0%, 25.2%, 36.4%, and 10.7% lower MACEs than the supervised approach MHN. Source code is available at https://github.com/RM-Zhang/SCPNet.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
MSTF: Multiscale Transformer for Incomplete Trajectory Prediction
Authors:
Zhanwen Liu,
Chao Li,
Nan Yang,
Yang Wang,
Jiaqi Ma,
Guangliang Cheng,
Xiangmo Zhao
Abstract:
Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assume complete observed trajectories, neglecting the potential impact of missing values induced by object occlusion, scope limitation, and sensor failures. Such ove…
▽ More
Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assume complete observed trajectories, neglecting the potential impact of missing values induced by object occlusion, scope limitation, and sensor failures. Such oversights inevitably compromise the accuracy of trajectory predictions. To tackle this challenge, we propose an end-to-end framework, termed Multiscale Transformer (MSTF), meticulously crafted for incomplete trajectory prediction. MSTF integrates a Multiscale Attention Head (MAH) and an Information Increment-based Pattern Adaptive (IIPA) module. Specifically, the MAH component concurrently captures multiscale motion representation of trajectory sequence from various temporal granularities, utilizing a multi-head attention mechanism. This approach facilitates the modeling of global dependencies in motion across different scales, thereby mitigating the adverse effects of missing values. Additionally, the IIPA module adaptively extracts continuity representation of motion across time steps by analyzing missing patterns in the data. The continuity representation delineates motion trend at a higher level, guiding MSTF to generate predictions consistent with motion continuity. We evaluate our proposed MSTF model using two large-scale real-world datasets. Experimental results demonstrate that MSTF surpasses state-of-the-art (SOTA) models in the task of incomplete trajectory prediction, showcasing its efficacy in addressing the challenges posed by missing values in motion forecasting for autonomous driving systems.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Measurement of microwave photon correlations at millikelvin with a thermal detector
Authors:
Aarne Keränen,
Qi-Ming Chen,
András Gunyhó,
Priyank Singh,
Jian Ma,
Visa Vesterinen,
Joonas Govenius,
Mikko Möttönen
Abstract:
Microwave photons are important carriers of quantum information in many promising platforms for quantum computing. They can be routinely generated, controlled, and teleported in experiments, indicating a variety of applications in quantum technology. However, observation of quantum statistical properties of microwave photons remains demanding: The energy of several microwave photons is considerabl…
▽ More
Microwave photons are important carriers of quantum information in many promising platforms for quantum computing. They can be routinely generated, controlled, and teleported in experiments, indicating a variety of applications in quantum technology. However, observation of quantum statistical properties of microwave photons remains demanding: The energy of several microwave photons is considerably smaller than the thermal fluctuation of any room-temperature detector, while amplification necessarily induces noise. Here, we present a measurement technique with a nanobolometer that directly measures the photon statistics at millikelvin and overcomes this trade-off. We apply our method to thermal states generated by a blackbody radiator operating in the regime of circuit quantum electrodynamics. We demonstrate the photon number resolvedness of the nanobolometer, and reveal the n(n+1)-scaling law of the photon number variance as indicated by the Bose--Einstein distribution. By engineering the coherent and incoherent proportions of the input field, we observe the transition between super-Poissonian and Poissonian statistics of the microwave photons from the bolometric second-order correlation measurement. This technique is poised to serve in fundamental tests of quantum mechanics with microwave photons and function as a scalable readout solution for a quantum information processor.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
UAV-Assisted Weather Radar Calibration: A Theoretical Model for Wind Influence on Metal Sphere Reflectivity
Authors:
Jiabiao Zhao,
Da Li,
Jiayuan Cui,
Houjun Sun,
Jianjun Ma
Abstract:
The calibration of weather radar for detecting meteorological phenomena has advanced rapidly, aiming to enhance accuracy. Utilizing an unmanned aerial vehicle (UAV) equipped with a suspended metal sphere introduces an efficient calibration method by allowing dynamic adjustment of the UAV's position, effectively acting as a mobile calibration platform. However, external factors such as wind can int…
▽ More
The calibration of weather radar for detecting meteorological phenomena has advanced rapidly, aiming to enhance accuracy. Utilizing an unmanned aerial vehicle (UAV) equipped with a suspended metal sphere introduces an efficient calibration method by allowing dynamic adjustment of the UAV's position, effectively acting as a mobile calibration platform. However, external factors such as wind can introduce bias in reflectivity measurements by causing the sphere to deviate from its intended position. This study develops a theoretical model to assess the impact of the metal sphere's one-dimensional oscillation on reflectivity. The findings offer valuable insights for UAV based radar calibration efforts.
△ Less
Submitted 20 June, 2024;
originally announced July 2024.
-
RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation
Authors:
Yuxuan Kuang,
Junjie Ye,
Haoran Geng,
Jiageng Mao,
Congyue Deng,
Leonidas Guibas,
He Wang,
Yue Wang
Abstract:
This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation, dubbed RAM, featuring generalizability across various objects, environments, and embodiments. Unlike existing approaches that learn manipulation from expensive in-domain demonstrations, RAM capitalizes on a retrieval-based affordance transfer paradigm to acquire versatile manipulation capabilities from abundan…
▽ More
This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation, dubbed RAM, featuring generalizability across various objects, environments, and embodiments. Unlike existing approaches that learn manipulation from expensive in-domain demonstrations, RAM capitalizes on a retrieval-based affordance transfer paradigm to acquire versatile manipulation capabilities from abundant out-of-domain data. First, RAM extracts unified affordance at scale from diverse sources of demonstrations including robotic data, human-object interaction (HOI) data, and custom data to construct a comprehensive affordance memory. Then given a language instruction, RAM hierarchically retrieves the most similar demonstration from the affordance memory and transfers such out-of-domain 2D affordance to in-domain 3D executable affordance in a zero-shot and embodiment-agnostic manner. Extensive simulation and real-world evaluations demonstrate that our RAM consistently outperforms existing works in diverse daily tasks. Additionally, RAM shows significant potential for downstream applications such as automatic and efficient data collection, one-shot visual imitation, and LLM/VLM-integrated long-horizon manipulation. For more details, please check our website at https://yxkryptonite.github.io/RAM/.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Authors:
Ye Bai,
**g** Chen,
Jitong Chen,
Wei Chen,
Zhuo Chen,
Chuang Ding,
Linhao Dong,
Qianqian Dong,
Yujiao Du,
Kepan Gao,
Lu Gao,
Yi Guo,
Minglun Han,
Ting Han,
Wenchao Hu,
Xinying Hu,
Yuxiang Hu,
Deyu Hua,
Lu Huang,
Mingkun Huang,
Youjia Huang,
Jishuo **,
Fanliu Kong,
Zongwei Lan,
Tianyu Li
, et al. (30 additional authors not shown)
Abstract:
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor…
▽ More
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance.
△ Less
Submitted 10 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Compositions of the Hercules-Aquila Cloud and Virgo Over-density
Authors:
Dashuang Ye,
Cuihua Du,
Mingji Deng,
Jiwei Liao,
Yang Huang,
Jianrong Shi,
Jun Ma
Abstract:
Based on a sample of K giant from Large sky Area Multi-Object fiber Spectroscopic Telescope (LAMOST) Data Release 8 and a sample of RR Lyrae (RRL) from \textit{Gaia} Data Release 3, we investigate the compositions of the Hercules-Aquila Cloud (HAC) and Virgo Over-density (VOD) and their collective contribution to the tilt and triaxiality of the stellar halo ($r\,\textless\,40\,{\rm kpc}$) as well…
▽ More
Based on a sample of K giant from Large sky Area Multi-Object fiber Spectroscopic Telescope (LAMOST) Data Release 8 and a sample of RR Lyrae (RRL) from \textit{Gaia} Data Release 3, we investigate the compositions of the Hercules-Aquila Cloud (HAC) and Virgo Over-density (VOD) and their collective contribution to the tilt and triaxiality of the stellar halo ($r\,\textless\,40\,{\rm kpc}$) as well as two breaks at $\approx15\,{\rm kpc}$ and 30\,kpc. We apply the Gaussian mixture model (GMM) to divide the stellar halo into the isotropic component and the radially biased anisotropic component, namely Gaia-Sausage-Enceladus (GSE), and find that both HAC and VOD are dominated by the GSE debris stars with weights of $0.67^{+0.09}_{-0.07}$ and $0.57^{+0.07}_{-0.06}$, respectively. In addition, using the K giants with orbital parameters, we identify the member stars of known substructures, including GSE, Sagittarius (Sgr), Helmi Streams, Sequoia, Thamnos, Pontus, Wukong, and Metal-weak Thick Disk (MWTD), to probe the compositions of low-eccentricity stars in the HAC and VOD regions. In density fittings of the RRL sample, we note that the absence of HAC and VOD has a weak effect on the shape of halo. Finally, we find that the radially biased anisotropic halo contributes majorly to the stellar halo that can be modelled with a tilted triaxial ellipsoid and a doubly broken power law with breaking radii at $18.08^{+2.04}_{-3.22}\,{\rm kpc}$ and $33.03^{+1.30}_{-1.21}\,{\rm kpc}$. This has important significance for understanding the status of large diffuse over-densities in the Milky Way.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Controlling quasi-parametric amplifications: From multiple PT-symmetry phase transitions to non-Hermitian sensing
Authors:
Xiaoxiong Wu,
Kai Bai,
Penghong Yu,
Zhaohui Dong,
Yanyan He,
**gui Ma,
Vladislav V. Yakovlev,
Meng Xiao,
Xianfeng Chen,
Luqi Yuan
Abstract:
Quasi-parametric amplification (QPA) is a nonlinear interaction in which the idler wave is depleted through some loss mechanism. QPA plays an important role in signal amplification in ultrafast photonics and quantum light generation. The QPA process has a number of features characterized by the non-Hermitian parity-time ($\mathcal{PT}$) symmetry. In this report, we explore new interaction regimes…
▽ More
Quasi-parametric amplification (QPA) is a nonlinear interaction in which the idler wave is depleted through some loss mechanism. QPA plays an important role in signal amplification in ultrafast photonics and quantum light generation. The QPA process has a number of features characterized by the non-Hermitian parity-time ($\mathcal{PT}$) symmetry. In this report, we explore new interaction regimes and uncover multiple $\mathcal{PT}$-symmetry phase transitions in such QPA process where transitions are particularly sensitive to external parameters. In particular, we demonstrate the feasibility of detection of $10^{-11}$ inhomogeneities of the doped absorber, which is order of magnitude more sensitive than similar measurements performed in a linear absorption regime. In doing so, we reveal a family of $\mathcal{PT}$-symmetry phase transitions appearing in the QPA process and provide a novel nonlinear optical sensing mechanism for precise optical measurements.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges
Authors:
Laifa Tao,
Shangyu Li,
Haifei Liu,
Qixuan Huang,
Liang Ma,
Guoao Ning,
Yiling Chen,
Yunlong Wu,
Bin Li,
Weiwei Zhang,
Zhengduo Zhao,
Wenchao Zhan,
Wenyan Cao,
Chao Wang,
Hongmei Liu,
Jian Ma,
Mingliang Suo,
Yujie Cheng,
Yu Ding,
Dengwei Song,
Chen Lu
Abstract:
Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Larg…
▽ More
Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Large Model, heralds a technological revolution with the potential to fundamentally reshape traditional technological fields and human production methods. Its capabilities, including strong generalization, reasoning, and generative attributes, present opportunities to address PHM's bottlenecks. To this end, based on a systematic analysis of the current challenges and bottlenecks in PHM, as well as the research status and advantages of Large Model, we propose a novel concept and three progressive paradigms of Prognosis and Health Management Large Model (PHM-LM) through the integration of the Large Model with PHM. Subsequently, we provide feasible technical approaches for PHM-LM to bolster PHM's core capabilities within the framework of the three paradigms. Moreover, to address core issues confronting PHM, we discuss a series of technical challenges of PHM-LM throughout the entire process of construction and application. This comprehensive effort offers a holistic PHM-LM technical framework, and provides avenues for new PHM technologies, methodologies, tools, platforms and applications, which also potentially innovates design, research & development, verification and application mode of PHM. And furthermore, a new generation of PHM with AI will also capably be realized, i.e., from custom to generalized, from discriminative to generative, and from theoretical conditions to practical applications.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be…
▽ More
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
SOAF: Scene Occlusion-aware Neural Acoustic Field
Authors:
Huiyu Gao,
Jiahao Ma,
David Ahmedt-Aristizabal,
Chuong Nguyen,
Miaomiao Liu
Abstract:
This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach…
▽ More
This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach called Scene Occlusion-aware Acoustic Field (SOAF) for accurate sound generation. Our approach derives a prior for sound energy field using distance-aware parametric sound-propagation modelling and then transforms it based on scene transmittance learned from the input video. We extract features from the local acoustic field centred around the receiver using a Fibonacci Sphere to generate binaural audio for novel views with a direction-aware attention mechanism. Extensive experiments on the real dataset RWAVS and the synthetic dataset SoundSpaces demonstrate that our method outperforms previous state-of-the-art techniques in audio generation. Project page: https://github.com/huiyu-gao/SOAF/.
△ Less
Submitted 2 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
Authors:
Jian Ma,
Yonglin Deng,
Chen Chen,
Haonan Lu,
Zhenyu Yang
Abstract:
Posters play a crucial role in marketing and advertising, contributing significantly to industrial design by enhancing visual communication and brand visibility. With recent advances in controllable text-to-image diffusion models, more concise research is now focusing on rendering text within synthetic images. Despite improvements in text rendering accuracy, the field of end-to-end poster generati…
▽ More
Posters play a crucial role in marketing and advertising, contributing significantly to industrial design by enhancing visual communication and brand visibility. With recent advances in controllable text-to-image diffusion models, more concise research is now focusing on rendering text within synthetic images. Despite improvements in text rendering accuracy, the field of end-to-end poster generation remains underexplored. This complex task involves striking a balance between text rendering accuracy and automated layout to produce high-resolution images with variable aspect ratios. To tackle this challenge, we propose an end-to-end text rendering framework employing a triple cross-attention mechanism rooted in align learning, designed to create precise poster text within detailed contextual backgrounds. Additionally, we introduce a high-resolution dataset that exceeds 1024 pixels in image resolution. Our approach leverages the SDXL architecture. Extensive experiments validate the ability of our method to generate poster images featuring intricate and contextually rich backgrounds. Codes will be available at https://github.com/OPPO-Mente-Lab/GlyphDraw2.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
FAFE: Immune Complex Modeling with Geodesic Distance Loss on Noisy Group Frames
Authors:
Ruidong Wu,
Ruihan Guo,
Rui Wang,
Shitong Luo,
Yue Xu,
Jiahan Li,
Jianzhu Ma,
Qiang Liu,
Yunan Luo,
Jian Peng
Abstract:
Despite the striking success of general protein folding models such as AlphaFold2(AF2, Jumper et al. (2021)), the accurate computational modeling of antibody-antigen complexes remains a challenging task. In this paper, we first analyze AF2's primary loss function, known as the Frame Aligned Point Error (FAPE), and raise a previously overlooked issue that FAPE tends to face gradient vanishing probl…
▽ More
Despite the striking success of general protein folding models such as AlphaFold2(AF2, Jumper et al. (2021)), the accurate computational modeling of antibody-antigen complexes remains a challenging task. In this paper, we first analyze AF2's primary loss function, known as the Frame Aligned Point Error (FAPE), and raise a previously overlooked issue that FAPE tends to face gradient vanishing problem on high-rotational-error targets. To address this fundamental limitation, we propose a novel geodesic loss called Frame Aligned Frame Error (FAFE, denoted as F2E to distinguish from FAPE), which enables the model to better optimize both the rotational and translational errors between two frames. We then prove that F2E can be reformulated as a group-aware geodesic loss, which translates the optimization of the residue-to-residue error to optimizing group-to-group geodesic frame distance. By fine-tuning AF2 with our proposed new loss function, we attain a correct rate of 52.3\% (DockQ $>$ 0.23) on an evaluation set and 43.8\% correct rate on a subset with low homology, with substantial improvement over AF2 by 182\% and 100\% respectively.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Deep Image-to-Recipe Translation
Authors:
Jiangqin Ma,
Bilal Mawji,
Franz Williams
Abstract:
The modern saying, "You Are What You Eat" resonates on a profound level, reflecting the intricate connection between our identities and the food we consume. Our project, Deep Image-to-Recipe Translation, is an intersection of computer vision and natural language generation that aims to bridge the gap between cherished food memories and the art of culinary creation. Our primary objective involves p…
▽ More
The modern saying, "You Are What You Eat" resonates on a profound level, reflecting the intricate connection between our identities and the food we consume. Our project, Deep Image-to-Recipe Translation, is an intersection of computer vision and natural language generation that aims to bridge the gap between cherished food memories and the art of culinary creation. Our primary objective involves predicting ingredients from a given food image. For this task, we first develop a custom convolutional network and then compare its performance to a model that leverages transfer learning. We pursue an additional goal of generating a comprehensive set of recipe steps from a list of ingredients. We frame this process as a sequence-to-sequence task and develop a recurrent neural network that utilizes pre-trained word embeddings. We address several challenges of deep learning including imbalanced datasets, data cleaning, overfitting, and hyperparameter selection. Our approach emphasizes the importance of metrics such as Intersection over Union (IoU) and F1 score in scenarios where accuracy alone might be misleading. For our recipe prediction model, we employ perplexity, a commonly used and important metric for language models. We find that transfer learning via pre-trained ResNet-50 weights and GloVe embeddings provide an exceptional boost to model performance, especially when considering training resource constraints. Although we have made progress on the image-to-recipe translation, there is an opportunity for future exploration with advancements in model architectures, dataset scalability, and enhanced user interaction.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Magnetic Excitations in Ferromagnetically Coupled Spin-1 Nanographenes
Authors:
Elia Turco,
Fupeng Wu,
Gonçalo Catarina,
Nils Krane,
Ji Ma,
Roman Fasel,
Xinliang Feng,
Pascal Ruffieux
Abstract:
In the quest for high-spin building blocks to form covalently bonded 1D or 2D materials with controlled magnetic interactions, $π$-electron magnetism provides an ideal framework to engineer large ferromagnetic interactions between nanographenes. As a first step in this direction, we investigate the spin properties of ferromagnetically coupled triangulenes, triangular nanographenes with spin…
▽ More
In the quest for high-spin building blocks to form covalently bonded 1D or 2D materials with controlled magnetic interactions, $π$-electron magnetism provides an ideal framework to engineer large ferromagnetic interactions between nanographenes. As a first step in this direction, we investigate the spin properties of ferromagnetically coupled triangulenes, triangular nanographenes with spin $S = 1$. Combining in-solution synthesis of rationally designed molecular precursors and on-surface synthesis, we achieve covalently bonded $S = 2$ triangulene dimers and $S = 3$ trimers on Au(111). Starting from the triangulene dimer, we thoroughly characterize its low-energy magnetic excitations using inelastic electron tunneling spectroscopy (IETS). IETS reveals conductance steps identified as a quintet to triplet excitation, and a zero-bias peak stemming from higher-order spin-spin scattering of the 5-fold degenerate ferromagnetic ground state. The Heisenberg picture captures the relevant parameters of inter-triangulene ferromagnetic exchange, and its successful extension to the larger $S = 3$ system confirms the model's accuracy. We expect that the addition of ferromagnetically coupled building blocks to the toolbox of magnetic nanographenes opens new opportunities to design carbon materials with complex magnetic ground states.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations
Authors:
Pengying Wu,
Yao Mu,
Kangjie Zhou,
Ji Ma,
Junting Chen,
Chang Liu
Abstract:
Visual navigation tasks are critical for household service robots. As these tasks become increasingly complex, effective communication and collaboration among multiple robots become imperative to ensure successful completion. In recent years, large language models (LLMs) have exhibited remarkable comprehension and planning abilities in the context of embodied agents. However, their application in…
▽ More
Visual navigation tasks are critical for household service robots. As these tasks become increasingly complex, effective communication and collaboration among multiple robots become imperative to ensure successful completion. In recent years, large language models (LLMs) have exhibited remarkable comprehension and planning abilities in the context of embodied agents. However, their application in household scenarios, specifically in the use of multiple agents collaborating to complete complex navigation tasks through communication, remains unexplored. Therefore, this paper proposes a framework for decentralized multi-agent navigation, leveraging LLM-enabled communication and collaboration. By designing the communication-triggered dynamic leadership organization structure, we achieve faster team consensus with fewer communication instances, leading to better navigation effectiveness and collaborative exploration efficiency. With the proposed novel communication scheme, our framework promises to be conflict-free and robust in multi-object navigation tasks, even when there is a surge in team size.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Prompt Refinement with Image Pivot for Text-to-Image Generation
Authors:
**gtao Zhan,
Qingyao Ai,
Yiqun Liu,
Yingwei Pan,
Ting Yao,
Jiaxin Mao,
Shao** Ma,
Tao Mei
Abstract:
For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement mod…
▽ More
For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement model. Inspired by zero-shot machine translation techniques, we introduce Prompt Refinement with Image Pivot (PRIP). PRIP innovatively uses the latent representation of a user-preferred image as an intermediary "pivot" between the user and system languages. It decomposes the refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and subsequently translating image representations into system languages. Thus, it can leverage abundant data for training. Extensive experiments show that PRIP substantially outperforms a wide range of baselines and effectively transfers to unseen systems in a zero-shot manner.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
S. Ahmed,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (495 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions…
▽ More
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components.
△ Less
Submitted 2 July, 2024; v1 submitted 28 June, 2024;
originally announced July 2024.
-
YuLan: An Open-source Large Language Model
Authors:
Yutao Zhu,
Kun Zhou,
Kelong Mao,
Wentong Chen,
Yiding Sun,
Zhipeng Chen,
Qian Cao,
Yihan Wu,
Yushuo Chen,
Feng Wang,
Lei Zhang,
Junyi Li,
Xiaolei Wang,
Lei Wang,
Beichen Zhang,
Zican Dong,
Xiaoxue Cheng,
Yuhan Chen,
Xinyu Tang,
Yupeng Hou,
Qiangqiang Ren,
Xincheng Pang,
Shufang Xie,
Wayne Xin Zhao,
Zhicheng Dou
, et al. (13 additional authors not shown)
Abstract:
Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi…
▽ More
Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billion parameters. The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts. We design a three-stage pre-training method to enhance YuLan's overall capabilities. Subsequent phases of training incorporate instruction-tuning and human alignment, employing a substantial volume of high-quality synthesized data. To facilitate the learning of complex and long-tail knowledge, we devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner. YuLan's training is finished on Jan, 2024 and has achieved performance on par with state-of-the-art LLMs across various English and Chinese benchmarks. This paper outlines a comprehensive technical roadmap for develo** LLMs from scratch. Our model and codes are available at https://github.com/RUC-GSAI/YuLan-Chat.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Exact results on traces of sets
Authors:
Mingze Li,
Jie Ma,
Mingyuan Rong
Abstract:
For non-negative integers $n$, $m$, $a$ and $b$, we write $\left( n,m \right) \rightarrow \left( a,b \right)$ if for every family $\mathcal{F}\subseteq 2^{[n]}$ with $|\mathcal{F}|\geqslant m$ there is an $a$-element set $T\subseteq [n]$ such that $\left| \mathcal{F}_{\mid T} \right| \geqslant b$, where $\mathcal{F}_{\mid T}=\{ F \cap T : F \in \mathcal{F} \}$. A longstanding problem in extremal s…
▽ More
For non-negative integers $n$, $m$, $a$ and $b$, we write $\left( n,m \right) \rightarrow \left( a,b \right)$ if for every family $\mathcal{F}\subseteq 2^{[n]}$ with $|\mathcal{F}|\geqslant m$ there is an $a$-element set $T\subseteq [n]$ such that $\left| \mathcal{F}_{\mid T} \right| \geqslant b$, where $\mathcal{F}_{\mid T}=\{ F \cap T : F \in \mathcal{F} \}$. A longstanding problem in extremal set theory asks to determine $m(s)=\lim_{n\rightarrow +\infty}\frac{m(n,s)}{n}$, where $m(n,s)$ denotes the maximum integer $m$ such that $\left( n,m \right) \rightarrow \left( n-1,m-s \right)$ holds for non-negatives $n$ and $s$. In this paper, we establish the exact value of $m(2^{d-1}-c)$ for all $1\leqslant c\leqslant d$ whenever $d\geqslant 50$, thereby solving an open problem posed by Piga and Schülke. To be precise, we show that $$m(n,2^{d-1}-c)=\frac{2^{d}-c}{d}n \mbox{ for } 1\leq c\leq d-1 \mbox{ and } d\mid n, \mbox{ and }
m(n,2^{d-1}-d)=\frac{2^{d}-d-0.5}{d}n \mbox{ for } 2d\mid n $$ holds for $d\geq 50$. Furthermore, we provide a proof that confirms a conjecture of Frankl and Watanabe from 1994, demonstrating that $m(11)=5.3$.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
The neutron array of the compact spectrometer for heavy ion experiments in Fermi energy region
Authors:
Dawei Si,
Sheng Xiao,
Yuhao Qin,
Yijie Wang,
Junhuai Xu,
Baiting Tian,
Boyuan Zhang,
Dong Guo,
Qin Zhi,
Xiaobao Wei,
Yibo Hao,
Zengxiang Wang,
Tianren Zhuo,
Yuansheng Yang,
Xianglun Wei,
Herun Yang,
Peng Ma,
Limin Duan,
Fangfang Duan,
Junbing Ma,
Shiwei Xu,
Zhen Bai,
Guo Yang,
Yanyun Yang,
Zhigang Xiao
Abstract:
The emission of neutrons from heavy ion reactions is an important observable for studying the asymmetric nuclear equation of state and the reaction dynamics. A 20-unit neutron array has been developed and mounted on the compact spectrometer for heavy ion experiments (CSHINE) to measure the neutron spectra, neutron-neutron and neutron-proton correlation functions. Each unit consists of a…
▽ More
The emission of neutrons from heavy ion reactions is an important observable for studying the asymmetric nuclear equation of state and the reaction dynamics. A 20-unit neutron array has been developed and mounted on the compact spectrometer for heavy ion experiments (CSHINE) to measure the neutron spectra, neutron-neutron and neutron-proton correlation functions. Each unit consists of a $\rm 15\times 15\times 15~cm^3$ plastic scintillator coupled to a $ φ=52 ~\rm mm$ photomultiplier. The Geant4 simulation with optical process is performed to investigate the time resolution and the neutron detection efficiency. The inherent time resolution of 212 ps is obtained by cosmic ray coincidence test. The n-$γ$ discrimination and time-of-flight performance are given by $\rm ^{252}Cf$ radioactive source test and beam test. The neutron energy spectra have been obtained in the angle range $30^\circ \le θ_{\rm lab} \le 51^\circ$ in the beam experiment of $^{124}$Sn+$^{124}$Sn at 25 MeV/u with CSHINE.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Spectrum and low-energy gap in triangular quantum spin liquid NaYbSe$_2$
Authors:
A. O. Scheie,
Minseong Lee,
Kevin Wang,
P. Laurell,
E. S. Choi,
D. Pajerowski,
Qingming Zhang,
Jie Ma,
H. D. Zhou,
Sangyun Lee,
S. M. Thomas,
M. O. Ajeesh,
P. F. S. Rosa,
Ao Chen,
Vivien S. Zapf,
M. Heyl,
C. D. Batista,
E. Dagotto,
J. E. Moore,
D. Alan Tennant
Abstract:
We report neutron scattering, pressure-dependent AC calorimetry, and AC magnetic susceptibility measurements of triangular lattice NaYbSe$_2$. We observe a continuum of scattering, which is reproduced by matrix product simulations, and no phase transition is detected in any bulk measurements. Comparison to heat capacity simulations suggest the material is within the Heisenberg spin liquid phase. A…
▽ More
We report neutron scattering, pressure-dependent AC calorimetry, and AC magnetic susceptibility measurements of triangular lattice NaYbSe$_2$. We observe a continuum of scattering, which is reproduced by matrix product simulations, and no phase transition is detected in any bulk measurements. Comparison to heat capacity simulations suggest the material is within the Heisenberg spin liquid phase. AC Susceptibility shows a significant 23~mK downturn, indicating a gap in the magnetic spectrum. The combination of a gap with no detectable magnetic order, comparison to theoretical models, and comparison to other $A$YbSe$_2$ compounds all strongly indicate NaYbSe$_2$ is within the quantum spin liquid phase. The gap also allows us to rule out a gapless Dirac spin liquid, with a gapped $\mathbb{Z}_2$ liquid the most natural explanation.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (649 additional authors not shown)
Abstract:
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and…
▽ More
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Predicting the Big Five Personality Traits in Chinese Counselling Dialogues Using Large Language Models
Authors:
Yang Yan,
Lizhi Ma,
Anqi Li,
**gsong Ma,
Zhenzhong Lan
Abstract:
Accurate assessment of personality traits is crucial for effective psycho-counseling, yet traditional methods like self-report questionnaires are time-consuming and biased. This study exams whether Large Language Models (LLMs) can predict the Big Five personality traits directly from counseling dialogues and introduces an innovative framework to perform the task. Our framework applies role-play an…
▽ More
Accurate assessment of personality traits is crucial for effective psycho-counseling, yet traditional methods like self-report questionnaires are time-consuming and biased. This study exams whether Large Language Models (LLMs) can predict the Big Five personality traits directly from counseling dialogues and introduces an innovative framework to perform the task. Our framework applies role-play and questionnaire-based prompting to condition LLMs on counseling sessions, simulating client responses to the Big Five Inventory. We evaluated our framework on 853 real-world counseling sessions, finding a significant correlation between LLM-predicted and actual Big Five traits, proving the validity of framework. Moreover, ablation studies highlight the importance of role-play simulations and task simplification via questionnaires in enhancing prediction accuracy. Meanwhile, our fine-tuned Llama3-8B model, utilizing Direct Preference Optimization with Supervised Fine-Tuning, achieves a 130.95\% improvement, surpassing the state-of-the-art Qwen1.5-110B by 36.94\% in personality prediction validity. In conclusion, LLMs can predict personality based on counseling dialogues. Our code and model are publicly available at \url{https://github.com/kuri-leo/BigFive-LLM-Predictor}, providing a valuable tool for future research in computational psychometrics.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Quantitative Global Carbon Inequality Network
Authors:
Yanming Guo,
Charles Guan,
** Ma
Abstract:
International trading networks significantly influence global economic conditions and environmental outcomes. A notable imbalance between economic gains and emissions transfers persists, manifesting as carbon inequality. This study introduces a novel metric, the Ecological Economic Equality Index, integrated with complex network dynamics analysis, to quantitatively evaluate the evolving roles with…
▽ More
International trading networks significantly influence global economic conditions and environmental outcomes. A notable imbalance between economic gains and emissions transfers persists, manifesting as carbon inequality. This study introduces a novel metric, the Ecological Economic Equality Index, integrated with complex network dynamics analysis, to quantitatively evaluate the evolving roles within the global trading network and to pinpoint inequities in trade relationships from 1995 to 2022. Utilising high spatiotemporal resolution data from the Environmentally Extended Multi-regional Input-output model, our findings reveal a widening disparity in carbon inequality and dynamic patterns. This analysis emphasises the gap in regional carbon inequality and identifies unequal trade. The study underscores that carbon inequality is a critical challenge affecting both develo** and developed regions, demanding widespread attention and action.
△ Less
Submitted 6 July, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction…
▽ More
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models
Authors:
Qi Liu,
Bo Wang,
Nan Wang,
Jiaxin Mao
Abstract:
Recent studies have demonstrated the effectiveness of using large language language models (LLMs) in passage ranking. The listwise approaches, such as RankGPT, have become new state-of-the-art in this task. However, the efficiency of RankGPT models is limited by the maximum context length and relatively high latency of LLM inference. To address these issues, in this paper, we propose PE-Rank, leve…
▽ More
Recent studies have demonstrated the effectiveness of using large language language models (LLMs) in passage ranking. The listwise approaches, such as RankGPT, have become new state-of-the-art in this task. However, the efficiency of RankGPT models is limited by the maximum context length and relatively high latency of LLM inference. To address these issues, in this paper, we propose PE-Rank, leveraging the single passage embedding as a good context compression for efficient listwise passage reranking. By treating each passage as a special token, we can directly input passage embeddings into LLMs, thereby reducing input length. Additionally, we introduce an inference method that dynamically constrains the decoding space to these special tokens, accelerating the decoding process. For adapting the model to reranking, we employ listwise learning to rank loss for training. Evaluation results on multiple benchmarks demonstrate that PE-Rank significantly improves efficiency in both prefilling and decoding, while maintaining competitive ranking effectiveness. {The Code is available at \url{https://github.com/liuqi6777/pe_rank}.}
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Individually Addressed Entangling Gates in a Two-Dimensional Ion Crystal
Authors:
Y. -H. Hou,
Y. -J. Yi,
Y. -K. Wu,
Y. -Y. Chen,
L. Zhang,
Y. Wang,
Y. -L. Xu,
C. Zhang,
Q. -X. Mei,
H. -X. Yang,
J. -Y. Ma,
S. -A. Guo,
J. Ye,
B. -X. Qi,
Z. -C. Zhou,
P. -Y. Hou,
L. -M. Duan
Abstract:
Two-dimensional (2D) ion crystals have become a promising way to scale up qubit numbers for ion trap quantum information processing. However, to realize universal quantum computing in this system, individually addressed high-fidelity two-qubit entangling gates still remain challenging due to the inevitable micromotion of ions in a 2D crystal as well as the technical difficulty in 2D addressing. He…
▽ More
Two-dimensional (2D) ion crystals have become a promising way to scale up qubit numbers for ion trap quantum information processing. However, to realize universal quantum computing in this system, individually addressed high-fidelity two-qubit entangling gates still remain challenging due to the inevitable micromotion of ions in a 2D crystal as well as the technical difficulty in 2D addressing. Here we demonstrate two-qubit entangling gates between any ion pairs in a 2D crystal of four ions. We use symmetrically placed crossed acousto-optic deflectors (AODs) to drive Raman transitions and achieve an addressing crosstalk error below 0.1%. We design and demonstrate a gate sequence by alternatingly addressing two target ions, making it compatible with any single-ion addressing techniques without crosstalk from multiple addressing beams. We further examine the gate performance versus the micromotion amplitude of the ions and show that its effect can be compensated by a recalibration of the laser intensity without degrading the gate fidelity. Our work paves the way for ion trap quantum computing with hundreds to thousands of qubits on a 2D ion crystal.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Causal Inference with Latent Variables: Recent Advances and Future Prospectives
Authors:
Yaochen Zhu,
Yinhan He,
**g Ma,
Mengxuan Hu,
Sheng Li,
Jundong Li
Abstract:
Causality lays the foundation for the trajectory of our world. Causal inference (CI), which aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial research topic. Nevertheless, the lack of observation of important variables (e.g., confounders, mediators, exogenous variables, etc.) severely compromises the reliability of CI methods. The issue may arise from t…
▽ More
Causality lays the foundation for the trajectory of our world. Causal inference (CI), which aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial research topic. Nevertheless, the lack of observation of important variables (e.g., confounders, mediators, exogenous variables, etc.) severely compromises the reliability of CI methods. The issue may arise from the inherent difficulty in measuring the variables. Additionally, in observational studies where variables are passively recorded, certain covariates might be inadvertently omitted by the experimenter. Depending on the type of unobserved variables and the specific CI task, various consequences can be incurred if these latent variables are carelessly handled, such as biased estimation of causal effects, incomplete understanding of causal mechanisms, lack of individual-level causal consideration, etc. In this survey, we provide a comprehensive review of recent developments in CI with latent variables. We start by discussing traditional CI techniques when variables of interest are assumed to be fully observed. Afterward, under the taxonomy of circumvention and inference-based methods, we provide an in-depth discussion of various CI strategies to handle latent variables, covering the tasks of causal effect estimation, mediation analysis, counterfactual reasoning, and causal discovery. Furthermore, we generalize the discussion to graph data where interference among units may exist. Finally, we offer fresh aspects for further advancement of CI with latent variables, especially new opportunities in the era of large language models (LLMs).
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Demonstration of High-Efficiency Microwave Heating Producing Record Highly Charged Xenon Ion Beams with Superconducting ECR Ion Sources
Authors:
X. Wang,
J. B. Li,
V. Mironov,
J. W. Guo,
X. Z. Zhang,
O. Tarvainen,
Y. C. Feng,
L. X. Li,
J. D. Ma,
Z. H. Zhang,
W. Lu,
S. Bogomolov,
L. Sun,
H. W. Zhao
Abstract:
Intense highly charged ion beam production is essential for high-power heavy ion accelerators. A novel movable Vlasov launcher for superconducting high charge state Electron Cyclotron Resonance (ECR) ion source has been devised that can affect the microwave power effectiveness by a factor of about 4 in terms of highly charged ion beam production. This approach based on a dedicated microwave launch…
▽ More
Intense highly charged ion beam production is essential for high-power heavy ion accelerators. A novel movable Vlasov launcher for superconducting high charge state Electron Cyclotron Resonance (ECR) ion source has been devised that can affect the microwave power effectiveness by a factor of about 4 in terms of highly charged ion beam production. This approach based on a dedicated microwave launching system instead of the traditional coupling scheme has led to new insight on microwave-plasma interaction. With this new understanding, the world record highly charged xenon ion beam currents have been enhanced by up to a factor of 2, which could directly and significantly enhance the performance of heavy ion accelerators and provide many new research opportunities in nuclear physics, atomic physics and other disciplines.
△ Less
Submitted 25 June, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
SwinStyleformer is a favorable choice for image inversion
Authors:
Jiawei Mao,
Guangyi Zhao,
Xuesong Yin,
Yuanqi Chang
Abstract:
This paper proposes the first pure Transformer structure inversion network called SwinStyleformer, which can compensate for the shortcomings of the CNNs inversion framework by handling long-range dependencies and learning the global structure of objects. Experiments found that the inversion network with the Transformer backbone could not successfully invert the image. The above phenomena arise fro…
▽ More
This paper proposes the first pure Transformer structure inversion network called SwinStyleformer, which can compensate for the shortcomings of the CNNs inversion framework by handling long-range dependencies and learning the global structure of objects. Experiments found that the inversion network with the Transformer backbone could not successfully invert the image. The above phenomena arise from the differences between CNNs and Transformers, such as the self-attention weights favoring image structure ignoring image details compared to convolution, the lack of multi-scale properties of Transformer, and the distribution differences between the latent code extracted by the Transformer and the StyleGAN style vector. To address these differences, we employ the Swin Transformer with a smaller window size as the backbone of the SwinStyleformer to enhance the local detail of the inversion image. Meanwhile, we design a Transformer block based on learnable queries. Compared to the self-attention transformer block, the Transformer block based on learnable queries provides greater adaptability and flexibility, enabling the model to update the attention weights according to specific tasks. Thus, the inversion focus is not limited to the image structure. To further introduce multi-scale properties, we design multi-scale connections in the extraction of feature maps. Multi-scale connections allow the model to gain a comprehensive understanding of the image to avoid loss of detail due to global modeling. Moreover, we propose an inversion discriminator and distribution alignment loss to minimize the distribution differences. Based on the above designs, our SwinStyleformer successfully solves the Transformer's inversion failure issue and demonstrates SOTA performance in image inversion and several related vision tasks.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Restorer: Solving Multiple Image Restoration Tasks with One Set of Parameters
Authors:
Jiawei Mao,
Xuesong Yin,
Yuanqi Chang
Abstract:
Although there are many excellent solutions in image restoration, the fact that they are specifically designed for a single image restoration task may prevent them from being state-of-the-art (SOTA) in other types of image restoration tasks. While some approaches require considering multiple image restoration tasks, they are still not sufficient for the requirements of the real world and may suffe…
▽ More
Although there are many excellent solutions in image restoration, the fact that they are specifically designed for a single image restoration task may prevent them from being state-of-the-art (SOTA) in other types of image restoration tasks. While some approaches require considering multiple image restoration tasks, they are still not sufficient for the requirements of the real world and may suffer from the task confusion issue. In this work, we focus on designing a unified and effective solution for multiple image restoration tasks including deraining, desnowing, defogging, deblurring, denoising, and low-light enhancement. Based on the above purpose, we propose a Transformer network Restorer with U-Net architecture. In order to effectively deal with degraded information in multiple image restoration tasks, we need a more comprehensive attention mechanism. Thus, we design all-axis attention (AAA) through stereo embedding and 3D convolution, which can simultaneously model the long-range dependencies in both spatial and channel dimensions, capturing potential correlations among all axis. Moreover, we propose a Restorer based on textual prompts. Compared to previous methods that employ learnable queries, textual prompts bring explicit task priors to solve the task confusion issue arising from learnable queries and introduce interactivity. Based on these designs, Restorer demonstrates SOTA or comparable performance in multiple image restoration tasks compared to universal image restoration frameworks and methods specifically designed for these individual tasks. Meanwhile, Restorer is faster during inference. The above results along with the real-world test results show that Restorer has the potential to serve as a backbone for multiple real-world image restoration tasks.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Deploying scalable traffic prediction models for efficient management in real-world large transportation networks during hurricane evacuations
Authors:
Qinhua Jiang,
Brian Yueshuai He,
Changju Lee,
Jiaqi Ma
Abstract:
Accurate traffic prediction is vital for effective traffic management during hurricane evacuation. This paper proposes a predictive modeling system that integrates Multilayer Perceptron (MLP) and Long-Short Term Memory (LSTM) models to capture both long-term congestion patterns and short-term speed patterns. Leveraging various input variables, including archived traffic data, spatial-temporal road…
▽ More
Accurate traffic prediction is vital for effective traffic management during hurricane evacuation. This paper proposes a predictive modeling system that integrates Multilayer Perceptron (MLP) and Long-Short Term Memory (LSTM) models to capture both long-term congestion patterns and short-term speed patterns. Leveraging various input variables, including archived traffic data, spatial-temporal road network information, and hurricane forecast data, the framework is designed to address challenges posed by heterogeneous human behaviors, limited evacuation data, and hurricane event uncertainties. Deployed in a real-world traffic prediction system in Louisiana, the model achieved an 82% accuracy in predicting long-term congestion states over a 6-hour period during a 7-day hurricane-impacted duration. The short-term speed prediction model exhibited Mean Absolute Percentage Errors (MAPEs) ranging from 7% to 13% across evacuation horizons from 1 to 6 hours. Evaluation results underscore the model's potential to enhance traffic management during hurricane evacuations, and real-world deployment highlights its adaptability and scalability in diverse hurricane scenarios within extensive transportation networks.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy
Authors:
Yiqun Chen,
Qi Liu,
Yi Zhang,
Weiwei Sun,
Daiting Shi,
Jiaxin Mao,
Dawei Yin
Abstract:
Large Language Models (LLMs) are increasingly employed in zero-shot documents ranking, yielding commendable results. However, several significant challenges still persist in LLMs for ranking: (1) LLMs are constrained by limited input length, precluding them from processing a large number of documents simultaneously; (2) The output document sequence is influenced by the input order of documents, re…
▽ More
Large Language Models (LLMs) are increasingly employed in zero-shot documents ranking, yielding commendable results. However, several significant challenges still persist in LLMs for ranking: (1) LLMs are constrained by limited input length, precluding them from processing a large number of documents simultaneously; (2) The output document sequence is influenced by the input order of documents, resulting in inconsistent ranking outcomes; (3) Achieving a balance between cost and ranking performance is quite challenging. To tackle these issues, we introduce a novel documents ranking method called TourRank, which is inspired by the tournament mechanism. This approach alleviates the impact of LLM's limited input length through intelligent grou**, while the tournament-like points system ensures robust ranking, mitigating the influence of the document input sequence. We test TourRank with different LLMs on the TREC DL datasets and the BEIR benchmark. Experimental results show that TourRank achieves state-of-the-art performance at a reasonable cost.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
Authors:
Lingrui Mei,
Shenghua Liu,
Yiwei Wang,
Baolong Bi,
Jiayi Mao,
Xueqi Cheng
Abstract:
"Jailbreak" is a major safety concern of Large Language Models (LLMs), which occurs when malicious prompts lead LLMs to produce harmful outputs, raising issues about the reliability and safety of LLMs. Therefore, an effective evaluation of jailbreaks is very crucial to develop its mitigation strategies. However, our research reveals that many jailbreaks identified by current evaluations may actual…
▽ More
"Jailbreak" is a major safety concern of Large Language Models (LLMs), which occurs when malicious prompts lead LLMs to produce harmful outputs, raising issues about the reliability and safety of LLMs. Therefore, an effective evaluation of jailbreaks is very crucial to develop its mitigation strategies. However, our research reveals that many jailbreaks identified by current evaluations may actually be hallucinations-erroneous outputs that are mistaken for genuine safety breaches. This finding suggests that some perceived vulnerabilities might not represent actual threats, indicating a need for more precise red teaming benchmarks. To address this problem, we propose the $\textbf{B}$enchmark for reli$\textbf{AB}$ilit$\textbf{Y}$ and jail$\textbf{B}$reak ha$\textbf{L}$l$\textbf{U}$cination $\textbf{E}$valuation (BabyBLUE). BabyBLUE introduces a specialized validation framework including various evaluators to enhance existing jailbreak benchmarks, ensuring outputs are useful malicious instructions. Additionally, BabyBLUE presents a new dataset as an augmentation to the existing red teaming benchmarks, specifically addressing hallucinations in jailbreaks, aiming to evaluate the true potential of jailbroken LLM outputs to cause harm to human society.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Authors:
Di Wang,
Meiqi Hu,
Yao **,
Yuchun Miao,
Jiaqi Yang,
Yichu Xu,
Xiaolei Qin,
Jiaqi Ma,
Lingyu Sun,
Chenxing Li,
Chuan Fu,
Hongruixuan Chen,
Chengxi Han,
Naoto Yokoya,
**g Zhang,
Minqiang Xu,
Lin Liu,
Lefei Zhang,
Chen Wu,
Bo Du,
Dacheng Tao,
Liangpei Zhang
Abstract:
Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,…
▽ More
Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA, a vision transformer-based foundation model for HSI interpretation, scalable to over a billion parameters. To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
Authors:
Shengkang Wang,
Hongzhan Lin,
Ziyang Luo,
Zhen Ye,
Guang Chen,
**g Ma
Abstract:
Large vision-language models (LVLMs) have significantly improved multimodal reasoning tasks, such as visual question answering and image captioning. These models embed multimodal facts within their parameters, rather than relying on external knowledge bases to store factual information explicitly. However, the content discerned by LVLMs may deviate from actual facts due to inherent bias or incorre…
▽ More
Large vision-language models (LVLMs) have significantly improved multimodal reasoning tasks, such as visual question answering and image captioning. These models embed multimodal facts within their parameters, rather than relying on external knowledge bases to store factual information explicitly. However, the content discerned by LVLMs may deviate from actual facts due to inherent bias or incorrect inference. To address this issue, we introduce MFC-Bench, a rigorous and comprehensive benchmark designed to evaluate the factual accuracy of LVLMs across three tasks: Manipulation, Out-of-Context, and Veracity Classification. Through our evaluation on MFC-Bench, we benchmarked 12 diverse and representative LVLMs, uncovering that current models still fall short in multimodal fact-checking and demonstrate insensitivity to various forms of manipulated content. We hope that MFC-Bench could raise attention to the trustworthy artificial intelligence potentially assisted by LVLMs in the future. The MFC-Bench and accompanying resources are publicly accessible at https://github.com/wskbest/MFC-Bench, contributing to ongoing research in the multimodal fact-checking field.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Retraining with Predicted Hard Labels Provably Increases Model Accuracy
Authors:
Rudrajit Das,
Inderjit S. Dhillon,
Alessandro Epasto,
Adel Javanmard,
Jieming Mao,
Vahab Mirrokni,
Sujay Sanghavi,
Peilin Zhong
Abstract:
The performance of a model trained with \textit{noisy labels} is often improved by simply \textit{retraining} the model with its own predicted \textit{hard} labels (i.e., $1$/$0$ labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable setting with randomly corrupted labels given to us and prove…
▽ More
The performance of a model trained with \textit{noisy labels} is often improved by simply \textit{retraining} the model with its own predicted \textit{hard} labels (i.e., $1$/$0$ labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable setting with randomly corrupted labels given to us and prove that retraining can improve the population accuracy obtained by initially training with the given (noisy) labels. To the best of our knowledge, this is the first such theoretical result. Retraining finds application in improving training with label differential privacy (DP) which involves training with noisy labels. We empirically show that retraining selectively on the samples for which the predicted label matches the given label significantly improves label DP training at \textit{no extra privacy cost}; we call this \textit{consensus-based retraining}. For e.g., when training ResNet-18 on CIFAR-100 with $ε=3$ label DP, we obtain $6.4\%$ improvement in accuracy with consensus-based retraining.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Learning Iterative Reasoning through Energy Diffusion
Authors:
Yilun Du,
Jiayuan Mao,
Joshua B. Tenenbaum
Abstract:
We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization. IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of optimization steps during inference ba…
▽ More
We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization. IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of optimization steps during inference based on problem difficulty, enabling it to solve problems outside its training distribution -- such as more complex Sudoku puzzles, matrix completion with large value magnitudes, and pathfinding in larger graphs. Key to our method's success is two novel techniques: learning a sequence of annealed energy landscapes for easier inference and a combination of score function and energy landscape supervision for faster and more stable training. Our experiments show that IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks, particularly in more challenging scenarios. Code and visualizations at https://energy-based-model.github.io/ired/
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
On Convergence and Rate of Convergence of Policy Improvement Algorithms
Authors:
** Ma,
Gaozhan Wang,
Jianfeng Zhang
Abstract:
In this paper we provide a simple proof from scratch for the convergence of Policy Improvement Algorithm (PIA) for a continuous time entropy-regularized stochastic control problem. Such convergence has been established by Huang-Wang-Zhou(2023) by using sophisticated PDE estimates for the iterative PDEs involved in the PIA. Our approach builds on some Feynman-Kac type probabilistic representation f…
▽ More
In this paper we provide a simple proof from scratch for the convergence of Policy Improvement Algorithm (PIA) for a continuous time entropy-regularized stochastic control problem. Such convergence has been established by Huang-Wang-Zhou(2023) by using sophisticated PDE estimates for the iterative PDEs involved in the PIA. Our approach builds on some Feynman-Kac type probabilistic representation formulae for solutions of PDEs and their derivatives. Moreover, in the infinite horizon model with a large discount factor and in the finite horizon model, we obtain the exponential rate of convergence with similar arguments. Finally, in the one dimensional setting, we extend the convergence result to the diffusion control case.
△ Less
Submitted 20 June, 2024; v1 submitted 16 June, 2024;
originally announced June 2024.
-
Beyond the Visible: Jointly Attending to Spectral and Spatial Dimensions with HSI-Diffusion for the FINCH Spacecraft
Authors:
Ian Vyse,
Rishit Dagli,
Dav Vrat Chadha,
John P. Ma,
Hector Chen,
Isha Ruparelia,
Prithvi Seran,
Matthew Xie,
Eesa Aamer,
Aidan Armstrong,
Naveen Black,
Ben Borstein,
Kevin Caldwell,
Orrin Dahanaggamaarachchi,
Joe Dai,
Abeer Fatima,
Stephanie Lu,
Maxime Michet,
Anoushka Paul,
Carrie Ann Po,
Shivesh Prakash,
Noa Prosser,
Riddhiman Roy,
Mirai Shinjo,
Iliya Shofman
, et al. (4 additional authors not shown)
Abstract:
Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and…
▽ More
Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and spatial information, it is prone to various types of noise, including random noise, stripe noise, and dead pixels. Effective denoising of these images is crucial for downstream scientific tasks. Traditional methods, including hand-crafted techniques encoding strong priors, learned 2D image denoising methods applied across different hyperspectral bands, or diffusion generative models applied independently on bands, often struggle with varying noise strengths across spectral bands, leading to significant spectral distortion. This paper presents a novel approach to hyperspectral image denoising using latent diffusion models that integrate spatial and spectral information. We particularly do so by building a 3D diffusion model and presenting a 3-stage training approach on real and synthetically crafted datasets. The proposed method preserves image structure while reducing noise. Evaluations on both popular hyperspectral denoising datasets and synthetically crafted datasets for the FINCH mission demonstrate the effectiveness of this approach.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
On automorphism groups of polar codes
Authors:
Jicheng Ma,
Guiying Yan
Abstract:
Over the past years, Polar codes have arisen as a highly effective class of linear codes, equipped with a decoding algorithm of low computational complexity. This family of codes share a common algebraic formalism with the well-known Reed-Muller codes, which involves monomial evaluations. As useful algebraic codes, more specifically known as decreasing monomial codes, a lot of decoding work has be…
▽ More
Over the past years, Polar codes have arisen as a highly effective class of linear codes, equipped with a decoding algorithm of low computational complexity. This family of codes share a common algebraic formalism with the well-known Reed-Muller codes, which involves monomial evaluations. As useful algebraic codes, more specifically known as decreasing monomial codes, a lot of decoding work has been done on Reed-Muller codes based on their rich code automorphisms. In 2021, a new permutation group decoder, referred to as the automorphism ensemble (AE) decoder, was introduced. This decoder can be applied to Polar codes and has been shown to produce similar decoding effects. However, identifying the right set of code automorphisms that enhance decoding performance for Polar codes remains a challenging task. This paper aims to characterize the full automorphism group of Polar codes. We will prove a reduction theorem that effectively reduces the problem of determining the full automorphism group of arbitrary random Polar codes to that of a specified class of Polar codes. Besides, we give exact classification of the full automorphism groups of families of Polar codes that are constructed using the Reed-Muller codes.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Diffuse X-ray Explorer: a high-resolution X-ray spectroscopic sky surveyor on the China Space Station
Authors:
Hai **,
Junjie Mao,
Liubiao Chen,
Naihui Chen,
Wei Cui,
Bo Gao,
**** Li,
Xinfeng Li,
Jiejia Liu,
Jia Quan,
Chunyang Jiang,
Guole Wang,
Le Wang,
Qian Wang,
Sifan Wang,
Aimin Xiao,
Shuo Zhang
Abstract:
DIffuse X-ray Explorer (DIXE) is a proposed high-resolution X-ray spectroscopic sky surveyor on the China Space Station (CSS). DIXE will focus on studying hot baryons in the Milky Way. Galactic hot baryons like the X-ray emitting Milky Way halo and eROSITA bubbles are best observed in the sky survey mode with a large field of view. DIXE will take advantage of the orbital motion of the CSS to scan…
▽ More
DIffuse X-ray Explorer (DIXE) is a proposed high-resolution X-ray spectroscopic sky surveyor on the China Space Station (CSS). DIXE will focus on studying hot baryons in the Milky Way. Galactic hot baryons like the X-ray emitting Milky Way halo and eROSITA bubbles are best observed in the sky survey mode with a large field of view. DIXE will take advantage of the orbital motion of the CSS to scan a large fraction of the sky. High-resolution X-ray spectroscopy, enabled by superconducting microcalorimeters based on the transition-edge sensor (TES) technology, will probe the physical properties (e.g., temperature, density, elemental abundances, kinematics) of the Galactic hot baryons. This will complement the high-resolution imaging data obtained with the eROSITA mission. Here we present the preliminary design of DIXE. The payload consists mainly of a detector assembly and a cryogenic cooling system. The key components of the detector assembly are a microcalorimeter array and frequency-domain multiplexing readout electronics. To provide a working temperature for the detector assembly, the cooling system consists of an adiabatic demagnetization refrigerator and a mechanical cryocooler system.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the…
▽ More
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.