-
Light nuclei photoproduction in relativistic heavy ion ultraperipheral collisions
Authors:
**-Yu Hu,
Shuo Lin,
Shi Pu,
Qun Wang
Abstract:
We have investigated light nuclei pair photoproduction in relativistic heavy ion ultraperipheral collisions. As a first attempt, we employ our previously developed quantum electrodynamics model, which incorporates a wave-packet description of initial nuclei, to compute the cross section for proton-antiproton pair photoproduction. The effective vertex for the photon and proton interaction is chosen…
▽ More
We have investigated light nuclei pair photoproduction in relativistic heavy ion ultraperipheral collisions. As a first attempt, we employ our previously developed quantum electrodynamics model, which incorporates a wave-packet description of initial nuclei, to compute the cross section for proton-antiproton pair photoproduction. The effective vertex for the photon and proton interaction is chosen based on studies of two-photon exchange effects in hadron physics. We present the transverse momentum, invariant mass, and azimuthal angle distributions of proton-antiproton pairs at $\sqrt{s_{NN}}=200$ GeV in Au+Au ultraperipheral collisions. We observe a $\cos(2φ)$ modulation and an almost negligible $\cos(4φ)$ modulation in the azimuthal angle distribution. Our studies helps us better understand the matter generated by light.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
A Survey of Controllable Learning: Methods and Applications in Information Retrieval
Authors:
Chenglei Shen,
Xiao Zhang,
Teng Shi,
Changshuo Zhang,
Guofu Xie,
Jun Xu
Abstract:
Controllable learning (CL) emerges as a critical component in trustworthy machine learning, ensuring that learners meet predefined targets and can adaptively adjust without retraining according to the changes in those targets. We provide a formal definition of CL, and discuss its applications in information retrieval (IR) where information needs are often complex and dynamic. The survey categorize…
▽ More
Controllable learning (CL) emerges as a critical component in trustworthy machine learning, ensuring that learners meet predefined targets and can adaptively adjust without retraining according to the changes in those targets. We provide a formal definition of CL, and discuss its applications in information retrieval (IR) where information needs are often complex and dynamic. The survey categorizes CL according to who controls (users or platforms), what is controllable (e.g., retrieval objectives, users' historical behaviors, controllable environmental adaptation), how control is implemented (e.g., rule-based method, Pareto optimization, Hypernetwork), and where to implement control (e.g.,pre-processing, in-processing, post-processing methods). Then, we identify challenges faced by CL across training, evaluation, task setting, and deployment in online environments. Additionally, we outline promising directions for CL in theoretical analysis, efficient computation, empowering large language models, application scenarios and evaluation frameworks in IR.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Few-Shot Keyword Spotting from Mixed Speech
Authors:
Junming Yuan,
Ying Shi,
LanTian Li,
Dong Wang,
Askar Hamdulla
Abstract:
Few-shot keyword spotting (KWS) aims to detect unknown keywords with limited training samples. A commonly used approach is the pre-training and fine-tuning framework. While effective in clean conditions, this approach struggles with mixed keyword spotting -- simultaneously detecting multiple keywords blended in an utterance, which is crucial in real-world applications. Previous research has propos…
▽ More
Few-shot keyword spotting (KWS) aims to detect unknown keywords with limited training samples. A commonly used approach is the pre-training and fine-tuning framework. While effective in clean conditions, this approach struggles with mixed keyword spotting -- simultaneously detecting multiple keywords blended in an utterance, which is crucial in real-world applications. Previous research has proposed a Mix-Training (MT) approach to solve the problem, however, it has never been tested in the few-shot scenario. In this paper, we investigate the possibility of using MT and other relevant methods to solve the two practical challenges together: few-shot and mixed speech. Experiments conducted on the LibriSpeech and Google Speech Command corpora demonstrate that MT is highly effective on this task when employed in either the pre-training phase or the fine-tuning phase. Moreover, combining SSL-based large-scale pre-training (HuBert) and MT fine-tuning yields very strong results in all the test conditions.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Faraday laser pumped cesium beam clock
Authors:
Hangbo Shi,
Xiaomin Qin,
Haijun Chen,
Yufei Yan,
Ziqi Lu,
Zhiyang Wang,
Zijie Liu,
Xiaolei Guan,
Qiang Wei,
Tiantian Shi,
**gbiao Chen
Abstract:
We realize a high-performance compact optically pumped cesium beam clock using Faraday laser simultaneously as pum** and detection lasers. The Faraday laser, which is frequency stabilized by modulation transfer spectroscopy (MTS) technique, has narrow linewidth and superior frequency stability. Measured by optical heterodyne method between two identical systems, the linewidth of the Faraday lase…
▽ More
We realize a high-performance compact optically pumped cesium beam clock using Faraday laser simultaneously as pum** and detection lasers. The Faraday laser, which is frequency stabilized by modulation transfer spectroscopy (MTS) technique, has narrow linewidth and superior frequency stability. Measured by optical heterodyne method between two identical systems, the linewidth of the Faraday laser is 2.5 kHz after MTS locking, and the fractional frequency stability of the Faraday laser is optimized to $1.8\times{10}^{-12}/\sqrtτ$. Based on this high-performance Faraday laser, the cesium beam clock realizes a signal-to-noise ratio (SNR) in 1 Hz bandwidth of $39600$ when the cesium oven temperature is 130°C. Frequency-compared with Hydrogen maser, the fractional frequency stability of the Faraday laser pumped cesium beam clock can reach $1.3\times{10}^{-12}/\sqrtτ$ and drops to $1.4\times{10}^{-14}$ at 10000 s when the cesium oven temperature is 110°C. %, which is the best reported result compared with other cesium beam clocks. This Faraday laser pumped cesium beam clock demonstrates its excellent performance, and its great potential in the fields of timekee**, navigation, and communication. Meanwhile, the Faraday laser, as a high-performance optical frequency standard, can also contribute to the development of other applications in quantum metrology, precision measurement and atomic physics.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Revisiting the Ultraviolet Tail of the Primordial Gravitational Wave
Authors:
Shi Pi,
Misao Sasaki,
Ao Wang,
Jianing Wang
Abstract:
High-frequency primordial gravitational waves (PGWs) with wave numbers larger than the Hubble parameter at the end of inflation are originated from the ultraviolet (UV) modes, which are never stretched out of the horizon. Such a UV tail of the PGW energy spectrum has a spurious logarithmic divergence. We study the origin of such a divergence, and find that it comes from the instantaneous inflation…
▽ More
High-frequency primordial gravitational waves (PGWs) with wave numbers larger than the Hubble parameter at the end of inflation are originated from the ultraviolet (UV) modes, which are never stretched out of the horizon. Such a UV tail of the PGW energy spectrum has a spurious logarithmic divergence. We study the origin of such a divergence, and find that it comes from the instantaneous inflation-to-post-inflation transition, which can be removed by considering a finite duration. For the first time, we obtain a semi-analytical expression for the PGW energy spectrum. We find that the UV tail decays exponentially, while the decay rate depends solely on the transition rate. When there is a stiff post-inflationary stage, the enhanced PGW displays a characteristic spectral shape of power-law increasing and exponential decaying. We propose a fitting formula which can be used for signal searching.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Near-Optimal MIMO Detection Using Gradient-Based MCMC in Discrete Spaces
Authors:
Xingyu Zhou,
Le Liang,
**g Zhang,
Chao-Kai Wen,
Shi **
Abstract:
The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas. Recently, the combination of two powerful machine learning methods, Markov chain Monte Carlo (MCMC) sampling and gradient descent, has emerged as a highly efficient solution to address this issue. However, existing…
▽ More
The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas. Recently, the combination of two powerful machine learning methods, Markov chain Monte Carlo (MCMC) sampling and gradient descent, has emerged as a highly efficient solution to address this issue. However, existing gradient-based MCMC detectors are heuristically designed and thus are theoretically untenable. To bridge this gap, we introduce a novel sampling algorithm tailored for discrete spaces. This algorithm leverages gradients from the underlying continuous spaces for acceleration while maintaining the validity of probabilistic sampling. We prove the convergence of this method and also analyze its convergence rate using both MCMC theory and empirical diagnostics. On this basis, we develop a MIMO detector that precisely samples from the target discrete distribution and generates posterior Bayesian estimates using these samples, whose performance is thereby theoretically guaranteed. Furthermore, our proposed detector is highly parallelizable and scalable to large MIMO dimensions, positioning it as a compelling candidate for next-generation wireless networks. Simulation results show that our detector achieves near-optimal performance, significantly outperforms state-of-the-art baselines, and showcases resilience to various system setups.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Deep Learning-based CSI Feedback in Wi-Fi Systems
Authors:
Fan Qi,
Jiajia Guo,
Yiming Cui,
Xiangyi Li,
Chao-Kai Wen,
Shi **
Abstract:
In Wi-Fi systems, channel state information (CSI) plays a crucial role in enabling access points to execute beamforming operations. However, the feedback overhead associated with CSI significantly hampers the throughput improvements. Recent advancements in deep learning (DL) have transformed the approach to CSI feedback in cellular systems. Drawing inspiration from the successes witnessed in the r…
▽ More
In Wi-Fi systems, channel state information (CSI) plays a crucial role in enabling access points to execute beamforming operations. However, the feedback overhead associated with CSI significantly hampers the throughput improvements. Recent advancements in deep learning (DL) have transformed the approach to CSI feedback in cellular systems. Drawing inspiration from the successes witnessed in the realm of mobile communications, this paper introduces a DL-based CSI feedback framework, named EFNet, tailored for Wi-Fi systems. The proposed framework leverages an autoencoder to achieve precise feedback with minimal overhead. The process involves the station utilizing the encoder to compress and quantize a series of matrices into codeword bit streams, which are then fed back to the access point. Subsequently, the decoder installed at the AP reconstructs beamforming matrices from these bit streams. We implement the EFNet system using standard Wi-Fi equipment operating in the 2.4 GHz band. Experimental findings in an office environment reveal a remarkable 80.77% reduction in feedback overhead compared to the 802.11ac standard, alongside a significant boost in net throughput of up to 30.72%.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Stochastic Linear-Quadratic Stackelberg Differential Game with Asymmetric Informational Uncertainties: Robust Optimization Approach
Authors:
Na Xiang,
**gtao Shi
Abstract:
This paper is concerned with a two-person zero-sum indefinite stochastic linear-quadratic Stackelberg differential game with asymmetric informational uncertainties, where both the leader and follower face different and unknown disturbances. We take a robust optimization approach and soft-constraint analysis, a min-max stochastic linear-quadratic optimal control problem is solved by the follower fi…
▽ More
This paper is concerned with a two-person zero-sum indefinite stochastic linear-quadratic Stackelberg differential game with asymmetric informational uncertainties, where both the leader and follower face different and unknown disturbances. We take a robust optimization approach and soft-constraint analysis, a min-max stochastic linear-quadratic optimal control problem is solved by the follower firstly. Then, the leader deal with a max-min stochastic linear-quadratic optimal control problem of forward-backward stochastic differential equations in an augmented space. State feedback representation of the robust Stackelberg equilibrium is given in a more explicit form by decoupling technique, via some Riccati equations.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
A New Framework for Nonlinear Kalman Filters
Authors:
Shida Jiang,
Junzhe Shi,
Scott Moura
Abstract:
The Kalman filter (KF) is a state estimation algorithm that optimally combines system knowledge and measurements to minimize the mean squared error of the estimated states. While KF was initially designed for linear systems, numerous extensions of it, such as extended Kalman filter (EKF), unscented Kalman filter (UKF), cubature Kalman filter (CKF), etc., have been proposed for nonlinear systems. A…
▽ More
The Kalman filter (KF) is a state estimation algorithm that optimally combines system knowledge and measurements to minimize the mean squared error of the estimated states. While KF was initially designed for linear systems, numerous extensions of it, such as extended Kalman filter (EKF), unscented Kalman filter (UKF), cubature Kalman filter (CKF), etc., have been proposed for nonlinear systems. Although different types of nonlinear KFs have different pros and cons, they all use the same framework of linear KF, which, according to what we found in this paper, tends to give overconfident and less accurate state estimations when the measurement functions are nonlinear. Therefore, in this study, we designed a new framework for nonlinear KFs and showed theoretically and empirically that the new framework estimates the states and covariance matrix more accurately than the old one. The new framework was tested on four different nonlinear KFs and five different tasks, showcasing its ability to reduce the estimation errors by several orders of magnitude in low-measurement-noise conditions, with only about a 10 to 90% increase in computational time. All types of nonlinear KFs can benefit from the new framework, and the benefit will increase as the sensors become more and more accurate in the future. As an example, EKF, the simplest nonlinear KF that was previously believed to work poorly for strongly nonlinear systems, can now provide fast and fairly accurate state estimations with the help of the new framework. The codes are available at https://github.com/Shida-Jiang/A-new-framework-for-nonlinear-Kalman-filters.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct
Authors:
Yutong Wu,
Di Huang,
Wenxuan Shi,
Wei Wang,
Lingzhe Gao,
Shihao Liu,
Ziyuan Nan,
Kaizhao Yuan,
Rui Zhang,
Xishan Zhang,
Zidong Du,
Qi Guo,
Yewen Pu,
Dawei Yin,
Xing Hu,
Yunji Chen
Abstract:
Recent advancements in open-source code large language models (LLMs) have demonstrated remarkable coding abilities by fine-tuning on the data generated from powerful closed-source LLMs such as GPT-3.5 and GPT-4 for instruction tuning. This paper explores how to further improve an instruction-tuned code LLM by generating data from itself rather than querying closed-source LLMs. Our key observation…
▽ More
Recent advancements in open-source code large language models (LLMs) have demonstrated remarkable coding abilities by fine-tuning on the data generated from powerful closed-source LLMs such as GPT-3.5 and GPT-4 for instruction tuning. This paper explores how to further improve an instruction-tuned code LLM by generating data from itself rather than querying closed-source LLMs. Our key observation is the misalignment between the translation of formal and informal languages: translating formal language (i.e., code) to informal language (i.e., natural language) is more straightforward than the reverse. Based on this observation, we propose INVERSE-INSTRUCT, which summarizes instructions from code snippets instead of the reverse. Specifically, given an instruction tuning corpus for code and the resulting instruction-tuned code LLM, we ask the code LLM to generate additional high-quality instructions for the original corpus through code summarization and self-evaluation. Then, we fine-tune the base LLM on the combination of the original corpus and the self-generated one, which yields a stronger instruction-tuned LLM. We present a series of code LLMs named InverseCoder, which surpasses the performance of the original code LLMs on a wide range of benchmarks, including Python text-to-code generation, multilingual coding, and data-science code generation.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Bulk high-temperature superconductivity in the high-pressure tetragonal phase of bilayer La2PrNi2O7
Authors:
Ningning Wang,
Gang Wang,
Xiaoling Shen,
Jun Hou,
Jun Luo,
** Ma,
Huaixin Yang,
Lifen Shi,
Jie Dou,
Jie Feng,
Jie Yang,
Yunqing Shi,
Zhian Ren,
Hanming Ma,
Pengtao Yang,
Ziyi Liu,
Yue Liu,
Hua Zhang,
Xiaoli Dong,
Yuxin Wang,
Kun Jiang,
Jiang** Hu,
Stuart Calder,
Jiaqiang Yan,
Jian** Sun
, et al. (4 additional authors not shown)
Abstract:
The Ruddlesden-Popper (R-P) bilayer nickelate, La3Ni2O7, was recently found to show signatures of high-temperature superconductivity (HTSC) at pressures above 14 GPa. Subsequent investigations achieved zero resistance in single- and poly-crystalline samples under hydrostatic pressure conditions. Yet, obvious diamagnetic signals, the other hallmark of superconductors, are still lacking owing to the…
▽ More
The Ruddlesden-Popper (R-P) bilayer nickelate, La3Ni2O7, was recently found to show signatures of high-temperature superconductivity (HTSC) at pressures above 14 GPa. Subsequent investigations achieved zero resistance in single- and poly-crystalline samples under hydrostatic pressure conditions. Yet, obvious diamagnetic signals, the other hallmark of superconductors, are still lacking owing to the filamentary nature with low superconducting volume fraction. The presence of a novel "1313" polymorph and competing R-P phases obscured proper identification of the phase for HTSC. Thus, achieving bulk HTSC and identifying the phase at play are the most prominent tasks at present. Here, we address these issues in the praseodymium (Pr)-doped La2PrNi2O7 polycrystalline samples. We find that the substitutions of Pr for La effectively inhibits the intergrowth of different R-P phases, resulting in nearly pure bilayer structure. For La2PrNi2O7, pressure-induced orthorhombic-to-tetragonal structural transition takes place at Pc ~ 11 GPa, above which HTSC emerges gradually upon further compression. The superconducting transition temperatures at 18-20 GPa reach Tconset = 82.5 K and Tczero = 60 K, which are the highest values among known nickelate superconductors. More importantly, bulk HTSC was testified by detecting clear diamagnetic signals below ~75 K corresponding to an estimated superconducting volume fraction ~ 57(5)% at 20 GPa. Our results not only resolve the existing controversies but also illuminate directions for exploring bulk HTSC in the bilayer nickelates.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space
Authors:
Yumeng Zhang,
Shi Gong,
Kaixin Xiong,
Xiaoqing Ye,
Xiao Tan,
Fan Wang,
Jizhou Huang,
Hua Wu,
Haifeng Wang
Abstract:
World models are receiving increasing attention in autonomous driving for their ability to predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for environment modeling. The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence…
▽ More
World models are receiving increasing attention in autonomous driving for their ability to predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for environment modeling. The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence diffusion model. The multi-modal tokenizer first encodes multi-modality information and the decoder is able to reconstruct the latent BEV tokens into LiDAR and image observations by ray-casting rendering in a self-supervised manner. Then the latent BEV sequence diffusion model predicts future scenarios given action tokens as conditions. Experiments demonstrate the effectiveness of BEVWorld in autonomous driving tasks, showcasing its capability in generating future scenes and benefiting downstream tasks such as perception and motion prediction. Code will be available at https://github.com/zympsyche/BevWorld.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification
Authors:
Jiaying Shi,
Xuetong Xue,
Shenghui Xu
Abstract:
The recent CLIP-based methods have shown promising zero-shot and few-shot performance on image classification tasks. Existing approaches such as CoOp and Tip-Adapter only focus on high-level visual features that are fully aligned with textual features representing the ``Summary" of the image. However, the goal of few-shot learning is to classify unseen images of the same category with few labeled…
▽ More
The recent CLIP-based methods have shown promising zero-shot and few-shot performance on image classification tasks. Existing approaches such as CoOp and Tip-Adapter only focus on high-level visual features that are fully aligned with textual features representing the ``Summary" of the image. However, the goal of few-shot learning is to classify unseen images of the same category with few labeled samples. Especially, in contrast to high-level representations, local representations (LRs) at low-level are more consistent between seen and unseen samples. Based on this point, we propose the Meta-Feature Adaption method (MF-Adapter) that combines the complementary strengths of both LRs and high-level semantic representations. Specifically, we introduce the Meta-Feature Unit (MF-Unit), which is a simple yet effective local similarity metric to measure category-consistent local context in an inductive manner. Then we train an MF-Adapter to map image features to MF-Unit for adequately generalizing the intra-class knowledge between unseen images and the support set. Extensive experiments show that our proposed method is superior to the state-of-the-art CLIP downstream few-shot classification methods, even showing stronger performance on a set of challenging visual classification tasks.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
A comparative study of ultraluminous infrared galaxies in the IRAS and SDSS Surveys
Authors:
Shaohua Zhang,
Zhijian Luo,
Xiheng Shi,
Chenggan Shu,
Hubing Xiao,
Hongyan Zhou
Abstract:
We present a comprehensive study of Ultraluminous Infrared Galaxies (ULIRGs), leveraging data from the IRAS Faint Source Catalogue (FSC) and the spectroscopic catalog in the Sloan Digital Sky Survey (SDSS) DR16. Our meticulous cross-matching technique significantly enhances the reliability of ULIRG identification, resulting in the identification of 283 reliable ULIRGs, including 102 new detections…
▽ More
We present a comprehensive study of Ultraluminous Infrared Galaxies (ULIRGs), leveraging data from the IRAS Faint Source Catalogue (FSC) and the spectroscopic catalog in the Sloan Digital Sky Survey (SDSS) DR16. Our meticulous cross-matching technique significantly enhances the reliability of ULIRG identification, resulting in the identification of 283 reliable ULIRGs, including 102 new detections, while discarding 120 previously reported false sources. Covering a redshift range of $z = 0.018 - 0.996$, with a median redshift of $\bar{z} = 0.259$, our uniform sample reveals apparent interaction features in approximately 40\% of ULIRGs, increasing to 92\% for those with $z < 0.1$. Through optical spectra analysis, it is indicated that over 58\% of ULIRGs host an AGN, which is twice as high as the detections based solely on infrared colors. Moreover, a pronounced excess of radio emissions associated with AGN activity results in a steeper radio-far-infrared correlation. Notably, Type I ULIRGs exhibit properties similar to those of narrow-line Seyfert 1 galaxies (NLS1s), with an elevated incidence rate of \ion{Mg}{2} BALs (16.7\%), surpassing that of typical optically selected quasars by over tenfold, consistent with current evolutionary models. We anticipate that forthcoming telescopes such as the China Space Station Telescope (CSST) and Leighton Chajnantor Telescope (LCT) will provide deeper insights into ULIRG morphology, dust distribution, molecular gas, and AGN activity.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Flying Calligrapher: Contact-Aware Motion and Force Planning and Control for Aerial Manipulation
Authors:
Xiaofeng Guo,
Guanqi He,
Jiahe Xu,
Mohammadreza Mousaei,
Junyi Geng,
Sebastian Scherer,
Guanya Shi
Abstract:
Aerial manipulation has gained interest in completing high-altitude tasks that are challenging for human workers, such as contact inspection and defect detection, etc. Previous research has focused on maintaining static contact points or forces. This letter addresses a more general and dynamic task: simultaneously tracking time-varying contact force in the surface normal direction and motion traje…
▽ More
Aerial manipulation has gained interest in completing high-altitude tasks that are challenging for human workers, such as contact inspection and defect detection, etc. Previous research has focused on maintaining static contact points or forces. This letter addresses a more general and dynamic task: simultaneously tracking time-varying contact force in the surface normal direction and motion trajectories on tangential surfaces. We propose a pipeline that includes a contact-aware trajectory planner to generate dynamically feasible trajectories, and a hybrid motion-force controller to track such trajectories. We demonstrate the approach in an aerial calligraphy task using a novel sponge pen design as the end-effector, whose stroke width is proportional to the contact force. Additionally, we develop a touchscreen interface for flexible user input. Experiments show our method can effectively draw diverse letters, achieving an IoU of 0.59 and an end-effector position (force) tracking RMSE of 2.9 cm (0.7 N). Website: https://xiaofeng-guo.github.io/flying-calligrapher/
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
$\mathrm{E^{2}CFD}$: Towards Effective and Efficient Cost Function Design for Safe Reinforcement Learning via Large Language Model
Authors:
Zepeng Wang,
Chao Ma,
Linjiang Zhou,
Libing Wu,
Lei Yang,
Xiaochuan Shi,
Guojun Peng
Abstract:
Different classes of safe reinforcement learning algorithms have shown satisfactory performance in various types of safety requirement scenarios. However, the existing methods mainly address one or several classes of specific safety requirement scenario problems and cannot be applied to arbitrary safety requirement scenarios. In addition, the optimization objectives of existing reinforcement learn…
▽ More
Different classes of safe reinforcement learning algorithms have shown satisfactory performance in various types of safety requirement scenarios. However, the existing methods mainly address one or several classes of specific safety requirement scenario problems and cannot be applied to arbitrary safety requirement scenarios. In addition, the optimization objectives of existing reinforcement learning algorithms are misaligned with the task requirements. Based on the need to address these issues, we propose $\mathrm{E^{2}CFD}$, an effective and efficient cost function design framework. $\mathrm{E^{2}CFD}$ leverages the capabilities of a large language model (LLM) to comprehend various safety scenarios and generate corresponding cost functions. It incorporates the \textit{fast performance evaluation (FPE)} method to facilitate rapid and iterative updates to the generated cost function. Through this iterative process, $\mathrm{E^{2}CFD}$ aims to obtain the most suitable cost function for policy training, tailored to the specific tasks within the safety scenario. Experiments have proven that the performance of policies trained using this framework is superior to traditional safe reinforcement learning algorithms and policies trained with carefully designed cost functions.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
The infrastructure powering IBM's Gen AI model development
Authors:
Talia Gershon,
Seetharami Seelam,
Brian Belgodere,
Milton Bonilla,
Lan Hoang,
Danny Barnett,
I-Hsin Chung,
Apoorve Mohan,
Ming-Hung Chen,
Lixiang Luo,
Robert Walkup,
Constantinos Evangelinos,
Shweta Salaria,
Marc Dombrowa,
Yoonho Park,
Apo Kayi,
Liran Schour,
Alim Alim,
Ali Sydney,
Pavlos Maniotis,
Laurent Schares,
Bernard Metzler,
Bengi Karacali-Akyamac,
Sophia Wen,
Tatsuhiro Chiba
, et al. (121 additional authors not shown)
Abstract:
AI Infrastructure plays a key role in the speed and cost-competitiveness of develo** and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi…
▽ More
AI Infrastructure plays a key role in the speed and cost-competitiveness of develo** and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Authors:
Ruibo Fu,
Xin Qi,
Zhengqi Wen,
Jianhua Tao,
Tao Wang,
Chunyu Qiang,
Zhiyong Wang,
Yi Lu,
Xiaopeng Wang,
Shuchen Shi,
Yukun Liu,
Xuefei Liu,
Shuai Zhang
Abstract:
Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle with inadequate speaker representation accuracy and overfitting, particularly in limited reference speeches scenarios. To address these challenges, we…
▽ More
Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle with inadequate speaker representation accuracy and overfitting, particularly in limited reference speeches scenarios. To address these challenges, we propose an Agile Speaker Representation Reinforcement Learning strategy to enhance speaker similarity in speaker adaptation tasks. ASRRL is the first work to apply reinforcement learning to improve the modeling accuracy of speaker embeddings in speaker adaptation, addressing the challenge of decoupling voice content and timbre. Our approach introduces two action strategies tailored to different reference speeches scenarios. In the single-sentence scenario, a knowledge-oriented optimal routine searching RL method is employed to expedite the exploration and retrieval of refinement information on the fringe of speaker representations. In the few-sentence scenario, we utilize a dynamic RL method to adaptively fuse reference speeches, enhancing the robustness and accuracy of speaker modeling. To achieve optimal results in the target domain, a multi-scale fusion scoring mechanism based reward model that evaluates speaker similarity, speech quality, and intelligibility across three dimensions is proposed, ensuring that improvements in speaker similarity do not compromise speech quality or intelligibility. The experimental results on the LibriTTS and VCTK datasets within mainstream TTS frameworks demonstrate the extensibility and generalization capabilities of the proposed ASRRL method. The results indicate that the ASRRL method significantly outperforms traditional fine-tuning approaches, achieving higher speaker similarity and better overall speech quality with limited reference speeches.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense
Authors:
Qi Zhou,
Zipeng Ye,
Yubo Tang,
Wenjian Luo,
Yuhui Shi,
Yan Jia
Abstract:
Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition. However, DNN model is fragile to backdoor attack. A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction, which causes serious security issues in applications. It is challenging for current defenses to eliminate the backdoor effective…
▽ More
Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition. However, DNN model is fragile to backdoor attack. A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction, which causes serious security issues in applications. It is challenging for current defenses to eliminate the backdoor effectively with limited computing resources, especially when the sizes and numbers of the triggers are variable as in the physical world. We propose an efficient backdoor defense based on evolutionary trigger detection and lightweight model repair. In the first phase of our method, CAM-focus Evolutionary Trigger Filter (CETF) is proposed for trigger detection. CETF is an effective sample-preprocessing based method with the evolutionary algorithm, and our experimental results show that CETF not only distinguishes the images with triggers accurately from the clean images, but also can be widely used in practice for its simplicity and stability in different backdoor attack situations. In the second phase of our method, we leverage several lightweight unlearning methods with the trigger detected by CETF for model repair, which also constructively demonstrate the underlying correlation of the backdoor with Batch Normalization layers. Source code will be published after accepted.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Authors:
Haorui He,
Zengqiang Shang,
Chaoren Wang,
Xuyuan Li,
Yicheng Gu,
Hua Hua,
Liwei Liu,
Chen Yang,
Jiaqi Li,
Peiyang Shi,
Yuancheng Wang,
Kai Chen,
Pengyuan Zhang,
Zhizheng Wu
Abstract:
Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper presents \textit{Emilia}, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, th…
▽ More
Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper presents \textit{Emilia}, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, the first open-source preprocessing pipeline designed to transform in-the-wild speech data into high-quality training data with annotations for speech generation. Emilia starts with over 101k hours of speech in six languages and features diverse speech with varied speaking styles. To facilitate the scale-up of Emilia, the open-source pipeline Emilia-Pipe can process one hour of raw speech data ready for model training in a few mins, which enables the research community to collaborate on large-scale speech generation research. Experimental results validate the effectiveness of Emilia. Demos are available at: https://emilia-dataset.github.io/Emilia-Demo-Page/.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
KAE: A Property-based Method for Knowledge Graph Alignment and Extension
Authors:
Daqian Shi,
Xiaoyue Li,
Fausto Giunchiglia
Abstract:
A common solution to the semantic heterogeneity problem is to perform knowledge graph (KG) extension exploiting the information encoded in one or more candidate KGs, where the alignment between the reference KG and candidate KGs is considered the critical procedure. However, existing KG alignment methods mainly rely on entity type (etype) label matching as a prerequisite, which is poorly performin…
▽ More
A common solution to the semantic heterogeneity problem is to perform knowledge graph (KG) extension exploiting the information encoded in one or more candidate KGs, where the alignment between the reference KG and candidate KGs is considered the critical procedure. However, existing KG alignment methods mainly rely on entity type (etype) label matching as a prerequisite, which is poorly performing in practice or not applicable in some cases. In this paper, we design a machine learning-based framework for KG extension, including an alternative novel property-based alignment approach that allows aligning etypes on the basis of the properties used to define them. The main intuition is that it is properties that intentionally define the etype, and this definition is independent of the specific label used to name an etype, and of the specific hierarchical schema of KGs. Compared with the state-of-the-art, the experimental results show the validity of the KG alignment approach and the superiority of the proposed KG extension framework, both quantitatively and qualitatively.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Fluid-Antenna Enhanced Integrated Sensing and Communication: Joint Antenna Positioning and Beamforming Design
Authors:
Tian Hao,
Changxin Shi,
Yinghong Guo,
Bin Xia,
Feng Yang
Abstract:
This paper investigates a fluid antenna (FA) enhanced integrated sensing and communication (ISAC) system consisting of a base station (BS), multiple single-antenna communication users, and one point target, where the BS is equipped with FAs to enhance both the communication and sensing performance. First, we formulate a problem that maximizes the radar signal-to-noise ratio (SNR) by jointly optimi…
▽ More
This paper investigates a fluid antenna (FA) enhanced integrated sensing and communication (ISAC) system consisting of a base station (BS), multiple single-antenna communication users, and one point target, where the BS is equipped with FAs to enhance both the communication and sensing performance. First, we formulate a problem that maximizes the radar signal-to-noise ratio (SNR) by jointly optimizing the FAs' positions and transmit beamforming matrix. Then, to tackle this highly non-convex problem, we present efficient algorithms by using alternating optimization (AO), successive convex approximation (SCA), and semi-definite relaxation (SDR). Numerical results demonstrate the convergence behavior and effectiveness of the proposed algorithm.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing
Authors:
Rabimba Karanjai,
Aftab Hussain,
Md Rafiqul Islam Rabin,
Lei Xu,
Weidong Shi,
Mohammad Amin Alipour
Abstract:
Unit testing is crucial in software engineering for ensuring quality. However, it's not widely used in parallel and high-performance computing software, particularly scientific applications, due to their smaller, diverse user base and complex logic. These factors make unit testing challenging and expensive, as it requires specialized knowledge and existing automated tools are often ineffective.…
▽ More
Unit testing is crucial in software engineering for ensuring quality. However, it's not widely used in parallel and high-performance computing software, particularly scientific applications, due to their smaller, diverse user base and complex logic. These factors make unit testing challenging and expensive, as it requires specialized knowledge and existing automated tools are often ineffective.
To address this, we propose an automated method for generating unit tests for such software, considering their unique features like complex logic and parallel processing. Recently, large language models (LLMs) have shown promise in coding and testing. We explored the capabilities of Davinci (text-davinci-002) and ChatGPT (gpt-3.5-turbo) in creating unit tests for C++ parallel programs. Our results show that LLMs can generate mostly correct and comprehensive unit tests, although they have some limitations, such as repetitive assertions and blank test cases.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Feedback-Driven Automated Whole Bug Report Reproduction for Android Apps
Authors:
Dingbang Wang,
Yu Zhao,
Sidong Feng,
Zhaoxu Zhang,
William G. J. Halfond,
Chunyang Chen,
Xiaoxia Sun,
Jiangfan Shi,
Tingting Yu
Abstract:
In software development, bug report reproduction is a challenging task. This paper introduces ReBL, a novel feedback-driven approach that leverages GPT-4, a large-scale language model, to automatically reproduce Android bug reports. Unlike traditional methods, ReBL bypasses the use of Step to Reproduce (S2R) entities. Instead, it leverages the entire textual bug report and employs innovative promp…
▽ More
In software development, bug report reproduction is a challenging task. This paper introduces ReBL, a novel feedback-driven approach that leverages GPT-4, a large-scale language model, to automatically reproduce Android bug reports. Unlike traditional methods, ReBL bypasses the use of Step to Reproduce (S2R) entities. Instead, it leverages the entire textual bug report and employs innovative prompts to enhance GPT's contextual reasoning. This approach is more flexible and context-aware than the traditional step-by-step entity matching approach, resulting in improved accuracy and effectiveness. In addition to handling crash reports, ReBL has the capability of handling non-crash bug reports. Our evaluation of 96 Android bug reports (73 crash and 23 non-crash) demonstrates that ReBL successfully reproduced 90.63% of these reports, averaging only 74.98 seconds per bug report. Additionally, ReBL outperformed three existing tools in both success rate and speed.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Search for the baryon number and lepton number violating decays $τ^-\to Λπ^-$ and $τ^-\to \barΛπ^-$ at Belle II
Authors:
Belle II Collaboration,
I. Adachi,
L. Aggarwal,
H. Ahmed,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Althubiti,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien
, et al. (349 additional authors not shown)
Abstract:
We present a search for the baryon number $B$ and lepton number $L$ violating decays $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛ π^-$ produced from the $e^+e^-\to τ^+τ^-$ process, using a 364 fb$^{-1}$ data sample collected by the Belle~II experiment at the SuperKEKB collider. No evidence of signal is found in either decay mode, which have $|Δ(B-L)|$ equal to $2$ and $0$, respectively. Upper…
▽ More
We present a search for the baryon number $B$ and lepton number $L$ violating decays $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛ π^-$ produced from the $e^+e^-\to τ^+τ^-$ process, using a 364 fb$^{-1}$ data sample collected by the Belle~II experiment at the SuperKEKB collider. No evidence of signal is found in either decay mode, which have $|Δ(B-L)|$ equal to $2$ and $0$, respectively. Upper limits at 90\% credibility level on the branching fractions of $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛπ^-$ are determined to be $4.7 \times 10^{-8}$ and $4.3 \times 10^{-8}$, respectively.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
MFE-ETP: A Comprehensive Evaluation Benchmark for Multi-modal Foundation Models on Embodied Task Planning
Authors:
Min Zhang,
Jianye Hao,
Xian Fu,
Peilong Han,
Hao Zhang,
Lei Shi,
Hongyao Tang,
Yan Zheng
Abstract:
In recent years, Multi-modal Foundation Models (MFMs) and Embodied Artificial Intelligence (EAI) have been advancing side by side at an unprecedented pace. The integration of the two has garnered significant attention from the AI research community. In this work, we attempt to provide an in-depth and comprehensive evaluation of the performance of MFM s on embodied task planning, aiming to shed lig…
▽ More
In recent years, Multi-modal Foundation Models (MFMs) and Embodied Artificial Intelligence (EAI) have been advancing side by side at an unprecedented pace. The integration of the two has garnered significant attention from the AI research community. In this work, we attempt to provide an in-depth and comprehensive evaluation of the performance of MFM s on embodied task planning, aiming to shed light on their capabilities and limitations in this domain. To this end, based on the characteristics of embodied task planning, we first develop a systematic evaluation framework, which encapsulates four crucial capabilities of MFMs: object understanding, spatio-temporal perception, task understanding, and embodied reasoning. Following this, we propose a new benchmark, named MFE-ETP, characterized its complex and variable task scenarios, typical yet diverse task types, task instances of varying difficulties, and rich test case types ranging from multiple embodied question answering to embodied task reasoning. Finally, we offer a simple and easy-to-use automatic evaluation platform that enables the automated testing of multiple MFMs on the proposed benchmark. Using the benchmark and evaluation platform, we evaluated several state-of-the-art MFMs and found that they significantly lag behind human-level performance. The MFE-ETP is a high-quality, large-scale, and challenging benchmark relevant to real-world tasks.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Spontaneous Reward Hacking in Iterative Self-Refinement
Authors:
Jane Pan,
He He,
Samuel R. Bowman,
Shi Feng
Abstract:
Language models are capable of iteratively improving their outputs based on natural language feedback, thus enabling in-context optimization of user preference. In place of human users, a second language model can be used as an evaluator, providing feedback along with numerical ratings which the generator attempts to optimize. However, because the evaluator is an imperfect proxy of user preference…
▽ More
Language models are capable of iteratively improving their outputs based on natural language feedback, thus enabling in-context optimization of user preference. In place of human users, a second language model can be used as an evaluator, providing feedback along with numerical ratings which the generator attempts to optimize. However, because the evaluator is an imperfect proxy of user preference, this optimization can lead to reward hacking, where the evaluator's ratings improve while the generation quality remains stagnant or even decreases as judged by actual user preference. The concern of reward hacking is heightened in iterative self-refinement where the generator and the evaluator use the same underlying language model, in which case the optimization pressure can drive them to exploit shared vulnerabilities. Using an essay editing task, we show that iterative self-refinement leads to deviation between the language model evaluator and human judgment, demonstrating that reward hacking can occur spontaneously in-context with the use of iterative self-refinement. In addition, we study conditions under which reward hacking occurs and observe two factors that affect reward hacking severity: model size and context sharing between the generator and the evaluator.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Multi-Antenna Technology for 6G Integrated Sensing and Communication
Authors:
Yong Zeng,
Zhenjun Dong,
Huizhi Wang,
Lipeng Zhu,
Ziyao Hong,
Qingji Jiang,
Dongming Wang,
Shi **,
Rui Zhang
Abstract:
By deploying antenna arrays at the transmitter/receiver to provide additional spatial-domain degrees of freedom (DoFs), multi-antenna technology greatly improves the reliability and efficiency of wireless communication. Meanwhile, the application of multi-antenna technology in the radar field has achieved spatial angle resolution and improved sensing DoF, thus significantly enhancing wireless sens…
▽ More
By deploying antenna arrays at the transmitter/receiver to provide additional spatial-domain degrees of freedom (DoFs), multi-antenna technology greatly improves the reliability and efficiency of wireless communication. Meanwhile, the application of multi-antenna technology in the radar field has achieved spatial angle resolution and improved sensing DoF, thus significantly enhancing wireless sensing performance. However, wireless communication and radar sensing have undergone independent development over the past few decades. As a result, although multi-antenna technology has dramatically advanced in these two fields separately, it has not been deeply integrated by exploiting their synergy. A new opportunity to fill up this gap arises as the integration of sensing and communication has been identified as one of the typical usage scenarios of the 6G communication network. Motivated by the above, this article aims to explore the multi-antenna technology for 6G ISAC, with the focus on its future development trends such as continuous expansion of antenna array scale, more diverse array architectures, and more flexible antenna designs. First, we introduce several new and promising antenna architectures, including the centralized antenna architectures based on traditional compact arrays or emerging sparse arrays, the distributed antenna architectures exemplified by the cell-free massive MIMO, and the movable/fluid antennas with flexible positions and/or orientations in a given 3D space. Next, for each antenna architecture mentioned above, we present the corresponding far-field/near-field channel models and analyze the communication and sensing performance. Finally, we summarize the characteristics of different antenna architectures and look forward to new ideas for solving the difficulties in acquiring CSI caused by the continuous expansion of antenna array scale and flexible antenna designs.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Efficient Detection of Long Consistent Cycles and its Application to Distributed Synchronization
Authors:
Shaohan Li,
Yunpeng Shi,
Gilad Lerman
Abstract:
Group synchronization plays a crucial role in global pipelines for Structure from Motion (SfM). Its formulation is nonconvex and it is faced with highly corrupted measurements. Cycle consistency has been effective in addressing these challenges. However, computationally efficient solutions are needed for cycles longer than three, especially in practical scenarios where 3-cycles are unavailable. To…
▽ More
Group synchronization plays a crucial role in global pipelines for Structure from Motion (SfM). Its formulation is nonconvex and it is faced with highly corrupted measurements. Cycle consistency has been effective in addressing these challenges. However, computationally efficient solutions are needed for cycles longer than three, especially in practical scenarios where 3-cycles are unavailable. To overcome this computational bottleneck, we propose an algorithm for group synchronization that leverages information from cycles of lengths ranging from three to six with a time complexity of order $O(n^3)$ (or $O(n^{2.373})$ when using a faster matrix multiplication algorithm). We establish non-trivial theory for this and related methods that achieves competitive sample complexity, assuming the uniform corruption model. To advocate the practical need for our method, we consider distributed group synchronization, which requires at least 4-cycles, and we illustrate state-of-the-art performance by our method in this context.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
A General Maximum Principle for Progressive Optimal Control of Fully Coupled Forward-Backward Stochastic Systems with Jumps
Authors:
Bin Wang,
Yu Si,
**gtao Shi
Abstract:
This paper is concerned with a general maximum principle for the fully coupled forward-backward stochastic optimal control problem with jumps, where the control domain is not necessarily convex, within the progressively measurable framework. It is worth noting that not only the control variable enters into all the coefficients, but also the jump size "$e$" . We first proposed that the solution…
▽ More
This paper is concerned with a general maximum principle for the fully coupled forward-backward stochastic optimal control problem with jumps, where the control domain is not necessarily convex, within the progressively measurable framework. It is worth noting that not only the control variable enters into all the coefficients, but also the jump size "$e$" . We first proposed that the solution $Z$ of BSDEP also contains the variable "$e$", which is different from previous articles and we provide an explanation in Remark 2.1.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
C$^3$DG: Conditional Domain Generalization for Hyperspectral Imagery Classification with Convergence and Constrained-risk Theories
Authors:
Zhe Gao,
Bin Pan,
Zhenwei Shi
Abstract:
Hyperspectral imagery (HSI) classification may suffer the challenge of hyperspectral-monospectra, where different classes present similar spectra. Joint spatial-spectral feature extraction is a popular solution for the problem, but this strategy tends to inflate accuracy since test pixels may exist in training patches. Domain generalization methods show promising potential, but they still fail to…
▽ More
Hyperspectral imagery (HSI) classification may suffer the challenge of hyperspectral-monospectra, where different classes present similar spectra. Joint spatial-spectral feature extraction is a popular solution for the problem, but this strategy tends to inflate accuracy since test pixels may exist in training patches. Domain generalization methods show promising potential, but they still fail to distinguish similar spectra across varying domains, in addition, the theoretical support is usually ignored. In this paper, we only rely on spectral information to solve the hyperspectral-monospectra problem, and propose a Convergence and Error-Constrained Conditional Domain Generalization method for Hyperspectral Imagery Classification (C$^3$DG). The major contributions of this paper include two aspects: the Conditional Revising Inference Block (CRIB), and the corresponding theories for model convergence and generalization errors. CRIB is the kernel structure of the proposed method, which employs a shared encoder and multi-branch decoders to fully leverage the conditional distribution during training, achieving a decoupling that aligns with the generation mechanisms of HSI. Moreover, to ensure model convergence and maintain controllable error, we propose the optimization convergence theorem and risk upper bound theorem. In the optimization convergence theorem, we ensure the model convergence by demonstrating that the gradients of the loss terms are not contradictory. In the risk upper bound theorem, our theoretical analysis explores the relationship between test-time training and recent related work to establish a concrete bound for error. Experimental results on three benchmark datasets indicate the superiority of C$^3$DG.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Occupancy as Set of Points
Authors:
Yiang Shi,
Tianheng Cheng,
Qian Zhang,
Wenyu Liu,
Xinggang Wang
Abstract:
In this paper, we explore a novel point representation for 3D occupancy prediction from multi-view images, which is named Occupancy as Set of Points. Existing camera-based methods tend to exploit dense volume-based representation to predict the occupancy of the whole scene, making it hard to focus on the special areas or areas out of the perception range. In comparison, we present the Points of In…
▽ More
In this paper, we explore a novel point representation for 3D occupancy prediction from multi-view images, which is named Occupancy as Set of Points. Existing camera-based methods tend to exploit dense volume-based representation to predict the occupancy of the whole scene, making it hard to focus on the special areas or areas out of the perception range. In comparison, we present the Points of Interest (PoIs) to represent the scene and propose OSP, a novel framework for point-based 3D occupancy prediction. Owing to the inherent flexibility of the point-based representation, OSP achieves strong performance compared with existing methods and excels in terms of training and inference adaptability. It extends beyond traditional perception boundaries and can be seamlessly integrated with volume-based methods to significantly enhance their effectiveness. Experiments on the Occ3D nuScenes occupancy benchmark show that OSP has strong performance and flexibility. Code and models are available at \url{https://github.com/hustvl/osp}.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Entanglement Polygon Inequalities for A Class of Mixed States
Authors:
Xian Shi
Abstract:
The study on the entanglement polygon inequality of multipartite systems has attracted much attention. However, most of the results are on pure states. Here we consider the property for a class of mixed states, which are the reduced density matrices of generalized W-class states in multipartite higher dimensional systems. First we show the class of mixed states satisfies the entanglement polygon i…
▽ More
The study on the entanglement polygon inequality of multipartite systems has attracted much attention. However, most of the results are on pure states. Here we consider the property for a class of mixed states, which are the reduced density matrices of generalized W-class states in multipartite higher dimensional systems. First we show the class of mixed states satisfies the entanglement polygon inequalities in terms of Tsallis-q entanglement, then we propose a class of tighter inequalities for mixed states in terms of Tsallis-q entanglement. At last, we get an inequality for the mixed states, which can be regarded as a relation for bipartite entanglement.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Serialized Output Training by Learned Dominance
Authors:
Ying Shi,
Lantian Li,
Shi Yin,
Dong Wang,
Jiqing Han
Abstract:
Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker speech recognition by sequentially decoding the speech of individual speakers. To address the challenging label-permutation issue, prior methods have relied on either the Permutation Invariant Training (PIT) or the time-based First-In-First-Out (FIFO) rule. This study presents a model-based serialization st…
▽ More
Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker speech recognition by sequentially decoding the speech of individual speakers. To address the challenging label-permutation issue, prior methods have relied on either the Permutation Invariant Training (PIT) or the time-based First-In-First-Out (FIFO) rule. This study presents a model-based serialization strategy that incorporates an auxiliary module into the Attention Encoder-Decoder architecture, autonomously identifying the crucial factors to order the output sequence of the speech components in multi-talker speech. Experiments conducted on the LibriSpeech and LibriMix databases reveal that our approach significantly outperforms the PIT and FIFO baselines in both 2-mix and 3-mix scenarios. Further analysis shows that the serialization module identifies dominant speech components in a mixture by factors including loudness and gender, and orders speech components based on the dominance score.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models
Authors:
Jiahuan Cao,
Dezhi Peng,
Peirong Zhang,
Yongxin Shi,
Yang Liu,
Kai Ding,
Lianwen **
Abstract:
Classical Chinese is a gateway to the rich heritage and wisdom of ancient China, yet its complexities pose formidable comprehension barriers for most modern people without specialized knowledge. While Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), they struggle with Classical Chinese Understanding (CCU), especially in data-demanding and knowle…
▽ More
Classical Chinese is a gateway to the rich heritage and wisdom of ancient China, yet its complexities pose formidable comprehension barriers for most modern people without specialized knowledge. While Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), they struggle with Classical Chinese Understanding (CCU), especially in data-demanding and knowledge-intensive tasks. In response to this dilemma, we propose \textbf{TongGu} (mean understanding ancient and modern), the first CCU-specific LLM, underpinned by three core contributions. First, we construct a two-stage instruction-tuning dataset ACCN-INS derived from rich classical Chinese corpora, aiming to unlock the full CCU potential of LLMs. Second, we propose Redundancy-Aware Tuning (RAT) to prevent catastrophic forgetting, enabling TongGu to acquire new capabilities while preserving its foundational knowledge. Third, we present a CCU Retrieval-Augmented Generation (CCU-RAG) technique to reduce hallucinations based on knowledge-grounding. Extensive experiments across 24 diverse CCU tasks validate TongGu's superior ability, underscoring the effectiveness of RAT and CCU-RAG. The model and dataset will be public available.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
DSMix: Distortion-Induced Sensitivity Map Based Pre-training for No-Reference Image Quality Assessment
Authors:
**song Shi,
Pan Gao,
Xiaojiang Peng,
Jie Qin
Abstract:
Image quality assessment (IQA) has long been a fundamental challenge in image understanding. In recent years, deep learning-based IQA methods have shown promising performance. However, the lack of large amounts of labeled data in the IQA field has hindered further advancements in these methods. This paper introduces DSMix, a novel data augmentation technique specifically designed for IQA tasks, ai…
▽ More
Image quality assessment (IQA) has long been a fundamental challenge in image understanding. In recent years, deep learning-based IQA methods have shown promising performance. However, the lack of large amounts of labeled data in the IQA field has hindered further advancements in these methods. This paper introduces DSMix, a novel data augmentation technique specifically designed for IQA tasks, aiming to overcome this limitation. DSMix leverages the distortion-induced sensitivity map (DSM) of an image as prior knowledge. It applies cut and mix operations to diverse categories of synthetic distorted images, assigning confidence scores to class labels based on the aforementioned prior knowledge. In the pre-training phase using DSMix-augmented data, knowledge distillation is employed to enhance the model's ability to extract semantic features. Experimental results on both synthetic and authentic IQA datasets demonstrate the significant predictive and generalization performance achieved by DSMix, without requiring fine-tuning of the full model. Code is available at \url{https://github.com/I2-Multimedia-Lab/DSMix}.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
CS3: Cascade SAM for Sperm Segmentation
Authors:
Yi Shi,
Xu-Peng Tian,
Yun-Kai Wang,
Tie-Yi Zhang,
Bin Yao,
Hui Wang,
Yong Shao,
Cen-Cen Wang,
Rong Zeng,
De-Chuan Zhan
Abstract:
Automated sperm morphology analysis plays a crucial role in the assessment of male fertility, yet its efficacy is often compromised by the challenges in accurately segmenting sperm images. Existing segmentation techniques, including the Segment Anything Model(SAM), are notably inadequate in addressing the complex issue of sperm overlap-a frequent occurrence in clinical samples. Our exploratory stu…
▽ More
Automated sperm morphology analysis plays a crucial role in the assessment of male fertility, yet its efficacy is often compromised by the challenges in accurately segmenting sperm images. Existing segmentation techniques, including the Segment Anything Model(SAM), are notably inadequate in addressing the complex issue of sperm overlap-a frequent occurrence in clinical samples. Our exploratory studies reveal that modifying image characteristics by removing sperm heads and easily segmentable areas, alongside enhancing the visibility of overlap** regions, markedly enhances SAM's efficiency in segmenting intricate sperm structures. Motivated by these findings, we present the Cascade SAM for Sperm Segmentation (CS3), an unsupervised approach specifically designed to tackle the issue of sperm overlap. This method employs a cascade application of SAM to segment sperm heads, simple tails, and complex tails in stages. Subsequently, these segmented masks are meticulously matched and joined to construct complete sperm masks. In collaboration with leading medical institutions, we have compiled a dataset comprising approximately 2,000 unlabeled sperm images to fine-tune our method, and secured expert annotations for an additional 240 images to facilitate comprehensive model assessment. Experimental results demonstrate superior performance of CS3 compared to existing methods.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Segregation at prior austenite grain boundaries: the competition between boron and hydrogen
Authors:
Guillaume Hachet,
Ali Tehranchi,
Hao Shi,
Manoj Prabhakar,
Shaolou Wei,
Katja Angenendt,
Stefan Zaefferer,
Baptiste Gault,
Binhan Sun,
Dirk Ponge,
Dierk Raabe
Abstract:
The interaction between boron and hydrogen at grain boundaries has been investigated experimentally and numerically in boron-doped and boron-free martensitic steels using thermal desorption spectrometry (TDS) and ab initio calculations. The calculations show that boron, which mostly segregates into prior austenite grain boundaries (PAGBs), repels hydrogen. This behavior has also been observed usin…
▽ More
The interaction between boron and hydrogen at grain boundaries has been investigated experimentally and numerically in boron-doped and boron-free martensitic steels using thermal desorption spectrometry (TDS) and ab initio calculations. The calculations show that boron, which mostly segregates into prior austenite grain boundaries (PAGBs), repels hydrogen. This behavior has also been observed using TDS measurements, with the disappearance of one peak when boron is incorporated into the microstructure. Additionally, the microstructure of both boron-doped and boron-free steels has been studied through electron backscattered diffraction, electron channeling contrast imaging, synchrotron X-ray measurements, and atom probe tomography. While both steels have a similar grain size, grain boundary distribution, and dislocation densities, pronounced boron segregation into PAGBs is observed for boron-doped steels. Therefore, the equilibrium hydrogen concentration in different trap** sites has been evaluated using the Langmuir-McLean approximation. This thermodynamic model shows that the distribution of hydrogen is identical for all traps when the total hydrogen concentration is low for boron-free steel. However, when it increases, traps of the lowest segregation energies (mostly PAGBs) are firstly saturated, which promotes failure initiation at this defect type. This finding partially explains why PAGBs are the weakest microstructure feature when martensitic steels are exposed to hydrogen-containing environments.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Compositions of the Hercules-Aquila Cloud and Virgo Over-density
Authors:
Dashuang Ye,
Cuihua Du,
Mingji Deng,
Jiwei Liao,
Yang Huang,
Jianrong Shi,
Jun Ma
Abstract:
Based on a sample of K giant from Large sky Area Multi-Object fiber Spectroscopic Telescope (LAMOST) Data Release 8 and a sample of RR Lyrae (RRL) from \textit{Gaia} Data Release 3, we investigate the compositions of the Hercules-Aquila Cloud (HAC) and Virgo Over-density (VOD) and their collective contribution to the tilt and triaxiality of the stellar halo ($r\,\textless\,40\,{\rm kpc}$) as well…
▽ More
Based on a sample of K giant from Large sky Area Multi-Object fiber Spectroscopic Telescope (LAMOST) Data Release 8 and a sample of RR Lyrae (RRL) from \textit{Gaia} Data Release 3, we investigate the compositions of the Hercules-Aquila Cloud (HAC) and Virgo Over-density (VOD) and their collective contribution to the tilt and triaxiality of the stellar halo ($r\,\textless\,40\,{\rm kpc}$) as well as two breaks at $\approx15\,{\rm kpc}$ and 30\,kpc. We apply the Gaussian mixture model (GMM) to divide the stellar halo into the isotropic component and the radially biased anisotropic component, namely Gaia-Sausage-Enceladus (GSE), and find that both HAC and VOD are dominated by the GSE debris stars with weights of $0.67^{+0.09}_{-0.07}$ and $0.57^{+0.07}_{-0.06}$, respectively. In addition, using the K giants with orbital parameters, we identify the member stars of known substructures, including GSE, Sagittarius (Sgr), Helmi Streams, Sequoia, Thamnos, Pontus, Wukong, and Metal-weak Thick Disk (MWTD), to probe the compositions of low-eccentricity stars in the HAC and VOD regions. In density fittings of the RRL sample, we note that the absence of HAC and VOD has a weak effect on the shape of halo. Finally, we find that the radially biased anisotropic halo contributes majorly to the stellar halo that can be modelled with a tilted triaxial ellipsoid and a doubly broken power law with breaking radii at $18.08^{+2.04}_{-3.22}\,{\rm kpc}$ and $33.03^{+1.30}_{-1.21}\,{\rm kpc}$. This has important significance for understanding the status of large diffuse over-densities in the Milky Way.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Scale-Dependent Dynamic Alignment in MHD Turbulence: Insights into Intermittency, Compressibility, and Imbalance Effects
Authors:
Nikos Sioulas,
Marco Velli,
Alfred Mallet,
Trevor A. Bowen,
B. D. G. Chandran,
Chen Shi,
S. S. Cerri,
Ioannis Liodis,
Tamar Ervin,
Davin E. Larson
Abstract:
Scale-Dependent Dynamic Alignment (SDDA) in Elsässer field fluctuations is theorized to suppress nonlinearities and modulate the energy spectrum. Limited empirical evidence exists for SDDA within the solar wind turbulence's inertial range. We analyzed data from the WIND mission to assess the effects of compressibility, intermittency, and imbalance on SDDA. SDDA consistently appears at energy-conta…
▽ More
Scale-Dependent Dynamic Alignment (SDDA) in Elsässer field fluctuations is theorized to suppress nonlinearities and modulate the energy spectrum. Limited empirical evidence exists for SDDA within the solar wind turbulence's inertial range. We analyzed data from the WIND mission to assess the effects of compressibility, intermittency, and imbalance on SDDA. SDDA consistently appears at energy-containing scales, with a trend toward misalignment at inertial scales. Compressible fluctuations show no increased alignment; however, their impact on SDDA's overall behavior is minimal. The alignment angles inversely correlate with field gradient intensity, likely due to "anomalous" or "counterpropagating" wave packet interactions. This suggests that SDDA originates from mutual shearing of Elsässer fields during imbalanced ($δ\boldsymbol{z}^{\pm} \gg δ\boldsymbol{z}^{\mp}$) interactions. Rigorous thresholding on field gradient intensity reveals SDDA signatures across much of the inertial range. The scaling of Elsässer increments' alignment angle, $Θ^{z}$, steepens with increasing global Alfvénic imbalance, while the angle between magnetic and velocity field increments, $Θ^{ub}$, becomes shallower. $Θ^{ub}$ only correlates with global Elsässer imbalance, steepening as the imbalance increases. Furthermore, increasing alignment in $Θ^{ub}$ persists deep into the inertial range of balanced intervals but collapses at large scales for imbalanced ones. Simplified theoretical analysis and modeling of high-frequency, low-amplitude noise in the velocity field indicate significant impacts on alignment angle measurements even at very low frequencies, with effects growing as global imbalance increases.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
Authors:
Gael Le Lan,
Bowen Shi,
Zhaoheng Ni,
Sidd Srinivasan,
Anurag Kumar,
Brian Ellis,
David Kant,
Varun Nagaraja,
Ernie Chang,
Wei-Ning Hsu,
Yangyang Shi,
Vikas Chandra
Abstract:
We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates the information loss drawback of discrete representations. Based on a diffusion transformer architecture trained on a flow-matching objective the model…
▽ More
We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates the information loss drawback of discrete representations. Based on a diffusion transformer architecture trained on a flow-matching objective the model can generate and edit diverse high quality stereo samples of variable duration, with simple text descriptions. We also explore a new regularized latent inversion method for zero-shot test-time text-guided editing and demonstrate its superior performance over naive denoising diffusion implicit model (DDIM) inversion for variety of music editing prompts. Evaluations are conducted on both objective and subjective metrics and demonstrate that the proposed model is not only competitive to the evaluated baselines on a standard text-to-music benchmark - quality and efficiency-wise - but also outperforms previous state of the art for music editing when combined with our proposed latent inversion. Samples are available at https://melodyflow.github.io.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval
Authors:
Kazuaki Furumai,
Roberto Legaspi,
Julio Vizcarra,
Yudai Yamazaki,
Yasutaka Nishimura,
Sina J. Semnani,
Kazushi Ikeda,
Weiyan Shi,
Monica S. Lam
Abstract:
Persuasion plays a pivotal role in a wide range of applications from health intervention to the promotion of social good. Persuasive chatbots can accelerate the positive effects of persuasion in such applications. Existing methods rely on fine-tuning persuasive chatbots with task-specific training data which is costly, if not infeasible, to collect. To address this issue, we propose a method to le…
▽ More
Persuasion plays a pivotal role in a wide range of applications from health intervention to the promotion of social good. Persuasive chatbots can accelerate the positive effects of persuasion in such applications. Existing methods rely on fine-tuning persuasive chatbots with task-specific training data which is costly, if not infeasible, to collect. To address this issue, we propose a method to leverage the generalizability and inherent persuasive abilities of large language models (LLMs) in creating effective and truthful persuasive chatbot for any given domain in a zero-shot manner. Unlike previous studies which used pre-defined persuasion strategies, our method first uses an LLM to generate responses, then extracts the strategies used on the fly, and replaces any unsubstantiated claims in the response with retrieved facts supporting the strategies. We applied our chatbot, PersuaBot, to three significantly different domains needing persuasion skills: donation solicitation, recommendations, and health intervention. Our experiments on simulated and human conversations show that our zero-shot approach is more persuasive than prior work, while achieving factual accuracy surpassing state-of-the-art knowledge-oriented chatbots. Our study demonstrated that when persuasive chatbots are employed responsibly for social good, it is an enabler of positive individual and social change.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Large Time Behavior of Solutions to Cauchy Problem for 1-D Compressible Isentropic Navier-Stokes/Allen-Cahn System
Authors:
Yazhou Chen,
Qiaolin He,
Xiaoding Shi
Abstract:
This paper is concerned with the large time behavior of the solutions to the Cauchy problem for the one-dimensional compressible Navier-Stokes/Allen-Cahn system with the immiscible two-phase flow initially located near the phase separation state. Under the assumptions that the initial data is a small perturbation of the constant state, we prove the global existence and uniqueness of the solutions…
▽ More
This paper is concerned with the large time behavior of the solutions to the Cauchy problem for the one-dimensional compressible Navier-Stokes/Allen-Cahn system with the immiscible two-phase flow initially located near the phase separation state. Under the assumptions that the initial data is a small perturbation of the constant state, we prove the global existence and uniqueness of the solutions and establish the time decay rates of the solution as well as its higher-order spatial derivatives. Moreover, we derive that the solutions of the system are time asymptotically approximated by the solutions of the modified parabolic system and obtain decay rates in $L^2$ and $L^1$. Furthermore, we show that the solution of the system is time asymptotically approximated in $L^p (1 \leq p \leq+\infty)$ by the diffusion waves.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Three-dimensional Imaging of Pion using Lattice QCD: Generalized Parton Distributions
Authors:
Heng-Tong Ding,
Xiang Gao,
Swagato Mukherjee,
Peter Petreczky,
Qi Shi,
Sergey Syritsyn,
Yong Zhao
Abstract:
In this work, we report a lattice calculation of $x$-dependent valence pion generalized parton distributions (GPDs) at zero skewness with multiple values of the momentum transfer $-t$. The calculations are based on an $N_f=2+1$ gauge ensemble of highly improved staggered quarks with Wilson-Clover valence fermion. The lattice spacing is 0.04 fm, and the pion valence mass is tuned to be 300 MeV. We…
▽ More
In this work, we report a lattice calculation of $x$-dependent valence pion generalized parton distributions (GPDs) at zero skewness with multiple values of the momentum transfer $-t$. The calculations are based on an $N_f=2+1$ gauge ensemble of highly improved staggered quarks with Wilson-Clover valence fermion. The lattice spacing is 0.04 fm, and the pion valence mass is tuned to be 300 MeV. We determine the Lorentz-invariant amplitudes of the quasi-GPD matrix elements for both symmetric and asymmetric momenta transfers with similar values and show the equivalence of both frames. Then, focusing on the asymmetric frame, we utilize a hybrid scheme to renormalize the quasi-GPD matrix elements obtained from the lattice calculations. After the Fourier transforms, the quasi-GPDs are then matched to the light-cone GPDs within the framework of large momentum effective theory with improved matching, including the next-to-next-to-leading order perturbative corrections, and leading renormalon and renormalization group resummations. We also present the 3-dimensional image of the pion in impact-parameter space through the Fourier transform of the momentum transfer $-t$.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
DDPM-MoCo: Advancing Industrial Surface Defect Generation and Detection with Generative and Contrastive Learning
Authors:
Yangfan He,
Xinyan Wang,
Tianyu Shi
Abstract:
The task of industrial detection based on deep learning often involves solving two problems: (1) obtaining sufficient and effective data samples, (2) and using efficient and convenient model training methods. In this paper, we introduce a novel defect-generation method, named DDPM-MoCo, to address these issues. Firstly, we utilize the Denoising Diffusion Probabilistic Model (DDPM) to generate high…
▽ More
The task of industrial detection based on deep learning often involves solving two problems: (1) obtaining sufficient and effective data samples, (2) and using efficient and convenient model training methods. In this paper, we introduce a novel defect-generation method, named DDPM-MoCo, to address these issues. Firstly, we utilize the Denoising Diffusion Probabilistic Model (DDPM) to generate high-quality defect data samples, overcoming the problem of insufficient sample data for model learning. Furthermore, we utilize the unsupervised learning Momentum Contrast model (MoCo) with an enhanced batch contrastive loss function for training the model on unlabeled data, addressing the efficiency and consistency challenges in large-scale negative sample encoding during diffusion model training. The experimental results showcase an enhanced visual detection method for identifying defects on metal surfaces, covering the entire process, starting from generating unlabeled sample data for training the diffusion model, to utilizing the same labeled sample data for downstream detection tasks. This study offers valuable practical insights and application potential for visual detection in the metal processing industry.
△ Less
Submitted 9 May, 2024;
originally announced July 2024.
-
A Unified Framework for 3D Scene Understanding
Authors:
Wei Xu,
Chunsheng Shi,
Sifan Tu,
Xin Zhou,
Dingkang Liang,
Xiang Bai
Abstract:
We propose UniSeg3D, a unified 3D segmentation framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary semantic segmentation tasks within a single model. Most previous 3D segmentation approaches are specialized for a specific task, thereby limiting their understanding of 3D scenes to a task-specific perspective. In contrast, the proposed method unifies six…
▽ More
We propose UniSeg3D, a unified 3D segmentation framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary semantic segmentation tasks within a single model. Most previous 3D segmentation approaches are specialized for a specific task, thereby limiting their understanding of 3D scenes to a task-specific perspective. In contrast, the proposed method unifies six tasks into unified representations processed by the same Transformer. It facilitates inter-task knowledge sharing and, therefore, promotes comprehensive 3D scene understanding. To take advantage of multi-task unification, we enhance the performance by leveraging task connections. Specifically, we design a knowledge distillation method and a contrastive learning method to transfer task-specific knowledge across different tasks. Benefiting from extensive inter-task knowledge sharing, our UniSeg3D becomes more powerful. Experiments on three benchmarks, including the ScanNet20, ScanRefer, and ScanNet200, demonstrate that the UniSeg3D consistently outperforms current SOTA methods, even those specialized for individual tasks. We hope UniSeg3D can serve as a solid unified baseline and inspire future work. The code will be available at https://dk-liang.github.io/UniSeg3D/.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Electromagnetic Property Sensing Based on Diffusion Model in ISAC System
Authors:
Yuhua Jiang,
Feifei Gao,
Shi **,
Tie Jun Cui
Abstract:
Integrated sensing and communications (ISAC) has opened up numerous game-changing opportunities for future wireless systems. In this paper, we develop a novel ISAC scheme that utilizes the diffusion model to sense the electromagnetic (EM) property of the target in a predetermined sensing area. Specifically, we first estimate the sensing channel by using both the communications and the sensing sign…
▽ More
Integrated sensing and communications (ISAC) has opened up numerous game-changing opportunities for future wireless systems. In this paper, we develop a novel ISAC scheme that utilizes the diffusion model to sense the electromagnetic (EM) property of the target in a predetermined sensing area. Specifically, we first estimate the sensing channel by using both the communications and the sensing signals echoed back from the target. Then we employ the diffusion model to generate the point cloud that represents the target and thus enables 3D visualization of the target's EM property distribution. In order to minimize the mean Chamfer distance (MCD) between the ground truth and the estimated point clouds, we further design the communications and sensing beamforming matrices under the constraint of a maximum transmit power and a minimum communications achievable rate for each user equipment (UE). Simulation results demonstrate the efficacy of the proposed method in achieving high-quality reconstruction of the target's shape, relative permittivity, and conductivity. Besides, the proposed method can sense the EM property of the target effectively in any position of the sensing area.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Visible, Near-, and Mid-infrared Computational Spectrometer Enabled by Single-Spinning Film Encoder
Authors:
Junren Wen,
Weiming Shi,
Cheng Gao,
Yujie Liu,
Shuaibo Feng,
Yu Shao,
Haiqi Gao,
Yuchuan Shao,
Yueguang Zhang,
Weidong Shen,
Chenying Yang
Abstract:
Computational spectrometers are pivotal in enabling low-cost, in-situ and rapid spectral analysis, with potential applications in chemistry, biology, and environmental science. However, filter-based spectral encoding approaches typically use filter arrays, complicating the manufacturing process and hindering device consistency. By capitalizing on the polarization separation effect under oblique in…
▽ More
Computational spectrometers are pivotal in enabling low-cost, in-situ and rapid spectral analysis, with potential applications in chemistry, biology, and environmental science. However, filter-based spectral encoding approaches typically use filter arrays, complicating the manufacturing process and hindering device consistency. By capitalizing on the polarization separation effect under oblique incidence (PSEOI), we pioneer the use of a single filter for highly efficient spectral encoding, and propose a novel computational spectrometer spanning visible to mid-infrared wavelengths by combining the Single-Spinning Film Encoder (SSFE) with deep learning-based reconstruction algorithm. The particle swarm optimization (PSO) method is employed to optimize the film configuration of SSFE, achieving low-correlation and high-complexity spectral responses under different polarizations and spinning angles, thereby enhancing both spectral resolution and accuracy of reconstruction across diverse spectral ranges. Spectral resolutions up to 0.5 nm, 2 nm, 10 nm can be realized for single-peak narrowband spectra, and 3 nm, 6 nm, 20 nm for dual-peak narrowband spectra, over the visible, near-, and mid-infrared wavelength ranges, respectively. Moreover, the proposed spectrometer demonstrates an overall 81.38% precision for the classification of 220 chemical compounds, confirming its robustness and precision in practical scenarios, along with the capability for compact, cost-effective spectroscopic solutions.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Dielectric Fano Nanoantennas for Enabling Sub-Nanosecond Lifetimes in NV-based Single Photon Emitters
Authors:
Shu An,
Dmitry Kalashnikov,
Wenqiao Shi,
Zackaria Mahfoud,
Ah Bian Chew,
Yan Liu,
**g Wu,
Di Zhu,
Weibo Gao,
Cheng-Wei Qiu,
Victor Leong,
Zhaogang Dong
Abstract:
Solid-state quantum emitters are essential sources of single photons, and enhancing their emission rates is of paramount importance for applications in quantum communications, computing, and metrology. One approach is to couple quantum emitters with resonant photonic nanostructures, where the emission rate is enhanced due to the Purcell effect. Dielectric nanoantennas are promising as they provide…
▽ More
Solid-state quantum emitters are essential sources of single photons, and enhancing their emission rates is of paramount importance for applications in quantum communications, computing, and metrology. One approach is to couple quantum emitters with resonant photonic nanostructures, where the emission rate is enhanced due to the Purcell effect. Dielectric nanoantennas are promising as they provide strong emission enhancement compared to plasmonic ones, which suffer from high Ohmic loss. Here, we designed and fabricated a dielectric Fano resonator based on a pair of silicon (Si) ellipses and a disk, which supports the mode hybridization between quasi-bound-states-in-the-continuum (quasi-BIC) and Mie resonance. We demonstrated the performance of the developed resonant system by interfacing it with single photon emitters (SPEs) based on nitrogen-vacancy (NV-) centers in nanodiamonds (NDs). We observed that the interfaced emitters have a Purcell enhancement factor of ~10, with sub-ns emission lifetime and a polarization contrast of 9. Our results indicate a promising method for develo** efficient and compact single-photon sources for integrated quantum photonics applications.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks
Authors:
Chengrui Huang,
Zhengliang Shi,
Yuntao Wen,
Xiuying Chen,
Peng Han,
Shen Gao,
Shuo Shang
Abstract:
Tool learning methods have enhanced the ability of large language models (LLMs) to interact with real-world applications. Many existing works fine-tune LLMs or design prompts to enable LLMs to select appropriate tools and correctly invoke them to meet user requirements. However, it is observed in previous works that the performance of tool learning varies from tasks, datasets, training settings, a…
▽ More
Tool learning methods have enhanced the ability of large language models (LLMs) to interact with real-world applications. Many existing works fine-tune LLMs or design prompts to enable LLMs to select appropriate tools and correctly invoke them to meet user requirements. However, it is observed in previous works that the performance of tool learning varies from tasks, datasets, training settings, and algorithms. Without understanding the impact of these factors, it can lead to inconsistent results, inefficient model deployment, and suboptimal tool utilization, ultimately hindering the practical integration and scalability of LLMs in real-world scenarios. Therefore, in this paper, we explore the impact of both internal and external factors on the performance of tool learning frameworks. Through extensive experiments on two benchmark datasets, we find several insightful conclusions for future work, including the observation that LLMs can benefit significantly from increased trial and exploration. We believe our empirical study provides a new perspective for future tool learning research.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.