-
Co-designing a Child-Robot Relational Norm Intervention to Regulate Children's Handwriting Posture
Authors:
Chenyang Wang,
Daniel Carnieto Tozadore,
Barbara Bruno,
Pierre Dillenbourg
Abstract:
Persuasive social robots employ their social influence to modulate children's behaviours in child-robot interaction. In this work, we introduce the Child-Robot Relational Norm Intervention (CRNI) model, leveraging the passive role of social robots and children's reluctance to inconvenience others to influence children's behaviours. Unlike traditional persuasive strategies that employ robots in act…
▽ More
Persuasive social robots employ their social influence to modulate children's behaviours in child-robot interaction. In this work, we introduce the Child-Robot Relational Norm Intervention (CRNI) model, leveraging the passive role of social robots and children's reluctance to inconvenience others to influence children's behaviours. Unlike traditional persuasive strategies that employ robots in active roles, CRNI utilizes an indirect approach by generating a disturbance for the robot in response to improper child behaviours, thereby motivating behaviour change through the avoidance of norm violations. The feasibility of CRNI is explored with a focus on improving children's handwriting posture. To this end, as a preliminary work, we conducted two participatory design workshops with 12 children and 1 teacher to identify effective disturbances that can promote posture correction.
△ Less
Submitted 13 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Authors:
Heng Yu,
Chaoyang Wang,
Peiye Zhuang,
Willi Menapace,
Aliaksandr Siarohin,
Junli Cao,
Laszlo A Jeni,
Sergey Tulyakov,
Hsin-Ying Lee
Abstract:
Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependen…
▽ More
Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependency on multi-view generative models and instead fully utilizing video generative models trained on diverse real-world datasets. Our method begins by generating a reference video using the video generation model. We then learn the canonical 3D representation of the video using a freeze-time video, delicately generated from the reference video. To handle inconsistencies in the freeze-time video, we jointly learn a per-frame deformation to model these imperfections. We then learn the temporal deformation based on the canonical representation to capture dynamic interactions in the reference video. The pipeline facilitates the generation of dynamic scenes with enhanced photorealism and structural integrity, viewable from multiple perspectives, thereby setting a new standard in 4D scene generation.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Graph-based multi-Feature fusion method for speech emotion recognition
Authors:
Xueyu Liu,
Jie Lin,
Chao Wang
Abstract:
Exploring proper way to conduct multi-speech feature fusion for cross-corpus speech emotion recognition is crucial as different speech features could provide complementary cues reflecting human emotion status. While most previous approaches only extract a single speech feature for emotion recognition, existing fusion methods such as concatenation, parallel connection, and splicing ignore heterogen…
▽ More
Exploring proper way to conduct multi-speech feature fusion for cross-corpus speech emotion recognition is crucial as different speech features could provide complementary cues reflecting human emotion status. While most previous approaches only extract a single speech feature for emotion recognition, existing fusion methods such as concatenation, parallel connection, and splicing ignore heterogeneous patterns in the interaction between features and features, resulting in performance of existing systems. In this paper, we propose a novel graph-based fusion method to explicitly model the relationships between every pair of speech features. Specifically, we propose a multi-dimensional edge features learning strategy called Graph-based multi-Feature fusion method for speech emotion recognition. It represents each speech feature as a node and learns multi-dimensional edge features to explicitly describe the relationship between each feature-feature pair in the context of emotion recognition. This way, the learned multi-dimensional edge features encode speech feature-level information from both the vertex and edge dimensions. Our Approach consists of three modules: an Audio Feature Generation(AFG)module, an Audio-Feature Multi-dimensional Edge Feature(AMEF) module and a Speech Emotion Recognition (SER) module. The proposed methodology yielded satisfactory outcomes on the SEWA dataset. Furthermore, the method demonstrated enhanced performance compared to the baseline in the AVEC 2019 Workshop and Challenge. We used data from two cultures as our training and validation sets: two cultures containing German and Hungarian on the SEWA dataset, the CCC scores for German are improved by 17.28% for arousal and 7.93% for liking. The outcomes of our methodology demonstrate a 13% improvement over alternative fusion techniques, including those employing one dimensional edge-based feature fusion approach.
△ Less
Submitted 13 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Movable-Antenna Array Empowered ISAC Systems for Low-Altitude Economy
Authors:
Ziming Kuang,
Wenchao Liu,
Chunjie Wang,
Zhenzhen **,
**ke Ren,
Xuhui Zhang,
Yanyan Shen
Abstract:
This paper investigates a movable-antenna (MA) array empowered integrated sensing and communications (ISAC) over low-altitude platform (LAP) system to support low-altitude economy (LAE) applications. In the considered system, an unmanned aerial vehicle (UAV) is dispatched to hover in the air, working as the UAV-enabled LAP (ULAP) to provide information transmission and sensing simultaneously for L…
▽ More
This paper investigates a movable-antenna (MA) array empowered integrated sensing and communications (ISAC) over low-altitude platform (LAP) system to support low-altitude economy (LAE) applications. In the considered system, an unmanned aerial vehicle (UAV) is dispatched to hover in the air, working as the UAV-enabled LAP (ULAP) to provide information transmission and sensing simultaneously for LAE applications. To improve the throughput capacity, we formulate a data rate maximization problem by jointly optimizing the transmit information and sensing beamforming and the antenna positions of the MA array. Since the data rate maximization problem is non-convex with highly coupled variables, we propose an efficient alternation optimization based algorithm, which iteratively optimizes parts of the variables while fixing others. Numerical results show the superiority of the proposed MA array-based scheme in terms of the achievable data rate and beamforming gain compared with two benchmark schemes.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Cinematic Gaussians: Real-Time HDR Radiance Fields with Depth of Field
Authors:
Chao Wang,
Krzysztof Wolski,
Bernhard Kerbl,
Ana Serrano,
Mojtaba Bemana,
Hans-Peter Seidel,
Karol Myszkowski,
Thomas Leimkühler
Abstract:
Radiance field methods represent the state of the art in reconstructing complex scenes from multi-view photos. However, these reconstructions often suffer from one or both of the following limitations: First, they typically represent scenes in low dynamic range (LDR), which restricts their use to evenly lit environments and hinders immersive viewing experiences. Secondly, their reliance on a pinho…
▽ More
Radiance field methods represent the state of the art in reconstructing complex scenes from multi-view photos. However, these reconstructions often suffer from one or both of the following limitations: First, they typically represent scenes in low dynamic range (LDR), which restricts their use to evenly lit environments and hinders immersive viewing experiences. Secondly, their reliance on a pinhole camera model, assuming all scene elements are in focus in the input images, presents practical challenges and complicates refocusing during novel-view synthesis. Addressing these limitations, we present a lightweight method based on 3D Gaussian Splatting that utilizes multi-view LDR images of a scene with varying exposure times, apertures, and focus distances as input to reconstruct a high-dynamic-range (HDR) radiance field. By incorporating analytical convolutions of Gaussians based on a thin-lens camera model as well as a tonemap** module, our reconstructions enable the rendering of HDR content with flexible refocusing capabilities. We demonstrate that our combined treatment of HDR and depth of field facilitates real-time cinematic rendering, outperforming the state of the art.
△ Less
Submitted 21 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Compact Polarization-Entangled Photon Source Based on Coexisting Noncritically Birefringent and Quasi Phase Matching in a Nonlinear Crystal
Authors:
C. -Y. Yang,
C. -Y. Wang,
K. -H. Lin,
T. -Y. Tsai,
C. -C. Lin,
C. Canalias,
L. -B. Wang,
A. Yabushita,
C. -S. Chuu
Abstract:
Polarization-entangled photons are indispensable to numerous quantum technologies and fundamental studies. In this paper, we propose and demonstrate a novel source that generates collinear polarization-entangled photons by simultaneously achieving two distinct types of phase-matching conditions (noncritically birefringent and quasi phase matching) in a periodically poled nonlinear crystal with a l…
▽ More
Polarization-entangled photons are indispensable to numerous quantum technologies and fundamental studies. In this paper, we propose and demonstrate a novel source that generates collinear polarization-entangled photons by simultaneously achieving two distinct types of phase-matching conditions (noncritically birefringent and quasi phase matching) in a periodically poled nonlinear crystal with a large poling period of 2 mm. The photon pairs are generated in a polarization-entangled state with a fidelity and concurrence of 0.998 and 0.935, respectively, and violate the Clauser-Horne-Shimony-Holt inequality by 84 standard deviations. The compact source does not require interferometer, delicate domain structures, or post selection, and is advantageous for scalable quantum computing and communication, where many replicas or chip-scale devices are needed.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit
Authors:
Lineghuan Meng,
Chuang Wang
Abstract:
This letter presents a high-dimensional analysis of the training dynamics for a single-layer nonlinear contrastive learning model. The empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE). Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations…
▽ More
This letter presents a high-dimensional analysis of the training dynamics for a single-layer nonlinear contrastive learning model. The empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE). Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations (ODEs), reflecting the evolution of the model performance during the training process. We analyze the fixed point locations and their stability of the ODEs unveiling several interesting findings. First, only the hidden variable's second moment affects feature learnability at the state with uninformative initialization. Second, higher moments influence the probability of feature selection by controlling the attraction region, rather than affecting local stability. Finally, independent noises added in the data argumentation degrade performance but negatively correlated noise can reduces the variance of gradient estimation yielding better performance. Despite of the simplicity of the analyzed model, it exhibits a rich phenomena of training dynamics, paving a way to understand more complex mechanism behind practical large models.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Evaluating Zero-Shot Long-Context LLM Compression
Authors:
Chenyu Wang,
Yihan Wang
Abstract:
This study evaluates the effectiveness of zero-shot compression techniques on large language models (LLMs) under long-context. We identify the tendency for computational errors to increase under long-context when employing certain compression methods. We propose a hypothesis to explain the varied behavior of different LLM compression techniques and explore remedies to mitigate the performance decl…
▽ More
This study evaluates the effectiveness of zero-shot compression techniques on large language models (LLMs) under long-context. We identify the tendency for computational errors to increase under long-context when employing certain compression methods. We propose a hypothesis to explain the varied behavior of different LLM compression techniques and explore remedies to mitigate the performance decline observed in some techniques under long-context. This is a course report for COS 598D Machine Learning and Systems by Prof. Kai Li at Princeton University. Due to limited computational resources, our experiments were conducted only on LLaMA-2-7B-32K.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
The measurement of the splashback radius of dark matter halo
Authors:
Weiwei Xu,
Huanyuan Shan,
Ran Li,
Ji Yao,
Chunxiang Wang,
Nan Li,
Chaoli Zhang
Abstract:
In the hierarchical evolution framework of cosmology, larger halos grow through matter accretion and halo mergers. To clarify the halo evolution, we need to define the halo mass and radius physically. However, the pseudo-evolution problem makes the process difficult. Thus, we aim to measure the splashback radius, a physically defined halo radius for a large number of halos with various mass and re…
▽ More
In the hierarchical evolution framework of cosmology, larger halos grow through matter accretion and halo mergers. To clarify the halo evolution, we need to define the halo mass and radius physically. However, the pseudo-evolution problem makes the process difficult. Thus, we aim to measure the splashback radius, a physically defined halo radius for a large number of halos with various mass and redshift, and to determine the most important parameters to affect it. We use the typical definition of splashback radius (Rsp) as the radius with the steepest radial density profile. In this work, we measure Rsp of dark matter halos within the mass of 1e13-3e15Msun and redshifts spanning 0.08-0.65. This is the measurement of the Rsp in the largest range of halo mass and redshift. Using the shear catalog of the DECaLS DR8, we investigate Rsp of halos associated with galaxies and galaxy clusters identified in the various catalogs. Our finding reveals a trend wherein massive halos demonstrate a larger Rsp, and the normalized splashback radius (Rsp/R200m) shows a U-shaped mass evolution. The upturn in these relations mainly comes from the contribution of massive halos with low redshifts. We further find Rsp increases with the peak height, while Rsp/R200m has a negative relation with the peak height. We also find the Rsp >~R200m for most halos, indicating their low accretion rates. Our result is consistent with previous literature across a wide range of mass, redshift, and peak height, as well as the simulation work from More et al. (2015).
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Brainstorming Brings Power to Large Language Models of Knowledge Reasoning
Authors:
Zining Qin,
Chenhao Wang,
Huiling Qin,
Weijia Jia
Abstract:
Large Language Models (LLMs) have demonstrated amazing capabilities in language generation, text comprehension, and knowledge reasoning. While a single powerful model can already handle multiple tasks, relying on a single perspective can lead to biased and unstable results. Recent studies have further improved the model's reasoning ability on a wide range of tasks by introducing multi-model collab…
▽ More
Large Language Models (LLMs) have demonstrated amazing capabilities in language generation, text comprehension, and knowledge reasoning. While a single powerful model can already handle multiple tasks, relying on a single perspective can lead to biased and unstable results. Recent studies have further improved the model's reasoning ability on a wide range of tasks by introducing multi-model collaboration. However, models with different capabilities may produce conflicting answers on the same problem, and how to reasonably obtain the correct answer from multiple candidate models has become a challenging problem. In this paper, we propose the multi-model brainstorming based on prompt. It incorporates different models into a group for brainstorming, and after multiple rounds of reasoning elaboration and re-inference, a consensus answer is reached within the group. We conducted experiments on three different types of datasets, and demonstrate that the brainstorming can significantly improve the effectiveness in logical reasoning and fact extraction. Furthermore, we find that two small-parameter models can achieve accuracy approximating that of larger-parameter models through brainstorming, which provides a new solution for distributed deployment of LLMs.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Demonstrating HumanTHOR: A Simulation Platform and Benchmark for Human-Robot Collaboration in a Shared Workspace
Authors:
Chenxu Wang,
Boyuan Du,
Jiaxin Xu,
Peiyan Li,
Di Guo,
Hua** Liu
Abstract:
Human-robot collaboration (HRC) in a shared workspace has become a common pattern in real-world robot applications and has garnered significant research interest. However, most existing studies for human-in-the-loop (HITL) collaboration with robots in a shared workspace evaluate in either simplified game environments or physical platforms, falling short in limited realistic significance or limited…
▽ More
Human-robot collaboration (HRC) in a shared workspace has become a common pattern in real-world robot applications and has garnered significant research interest. However, most existing studies for human-in-the-loop (HITL) collaboration with robots in a shared workspace evaluate in either simplified game environments or physical platforms, falling short in limited realistic significance or limited scalability. To support future studies, we build an embodied framework named HumanTHOR, which enables humans to act in the simulation environment through VR devices to support HITL collaborations in a shared workspace. To validate our system, we build a benchmark of everyday tasks and conduct a preliminary user study with two baseline algorithms. The results show that the robot can effectively assist humans in collaboration, demonstrating the significance of HRC. The comparison among different levels of baselines affirms that our system can adequately evaluate robot capabilities and serve as a benchmark for different robot algorithms. The experimental results also indicate that there is still much room in the area and our system can provide a preliminary foundation for future HRC research in a shared workspace. More information about the simulation environment, experiment videos, benchmark descriptions, and additional supplementary materials can be found on the website: https://sites.google.com/view/humanthor/.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Global-in-time energy stability analysis for the exponential time differencing Runge-Kutta scheme for the phase field crystal equation
Authors:
Xiao Li,
Zhonghua Qiao,
Cheng Wang,
Nan Zheng
Abstract:
The global-in-time energy estimate is derived for the second-order accurate exponential time differencing Runge-Kutta (ETDRK2) numerical scheme to the phase field crystal (PFC) equation, a sixth-order parabolic equation modeling crystal evolution. To recover the value of stabilization constant, some local-in-time convergence analysis has been reported, and the energy stability becomes available ov…
▽ More
The global-in-time energy estimate is derived for the second-order accurate exponential time differencing Runge-Kutta (ETDRK2) numerical scheme to the phase field crystal (PFC) equation, a sixth-order parabolic equation modeling crystal evolution. To recover the value of stabilization constant, some local-in-time convergence analysis has been reported, and the energy stability becomes available over a fixed final time. In this work, we develop a global-in-time energy estimate for the ETDRK2 numerical scheme to the PFC equation by showing the energy dissipation property for any final time. An a priori assumption at the previous time step, combined with a single-step $H^2$ estimate of the numerical solution, is the key point in the analysis. Such an $H^2$ estimate recovers the maximum norm bound of the numerical solution at the next time step, and then the value of the stabilization parameter can be theoretically justified. This justification ensures the energy dissipation at the next time step, so that the mathematical induction can be effectively applied, by then the global-in-time energy estimate is accomplished. This paper represents the first effort to theoretically establish a global-in-time energy stability analysis for a second-order stabilized numerical scheme in terms of the original free energy functional. The presented methodology is expected to be available for many other Runge-Kutta numerical schemes to the gradient flow equations.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection
Authors:
Yujie Chen,
Jiangyan Yi,
Jun Xue,
Chenglong Wang,
Xiaohui Zhang,
Shunbo Dong,
Siding Zeng,
Jianhua Tao,
Lv Zhao,
Cunhang Fan
Abstract:
Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepf…
▽ More
Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepfake detection. Specifically, we use sinc Layer and multiple convolutional layers to capture short-range features, and then design a bidirectional Mamba to address Mamba's unidirectional modelling problem and further capture long-range feature information. Moreover, we develop a bidirectional fusion module to integrate embeddings, enhancing audio context representation and combining short- and long-range information. The results show that our proposed RawBMamba achieves a 34.1\% improvement over Rawformer on ASVspoof2021 LA dataset, and demonstrates competitive performance on other datasets.
△ Less
Submitted 18 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement
Authors:
Peiye Zhuang,
Songfang Han,
Chaoyang Wang,
Aliaksandr Siarohin,
Jiaxu Zou,
Michael Vasilkovsky,
Vladislav Shakhrai,
Sergey Korolev,
Sergey Tulyakov,
Hsin-Ying Lee
Abstract:
We propose a novel approach for 3D mesh reconstruction from multi-view images. Our method takes inspiration from large reconstruction models like LRM that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images. However, in our method, we introduce several important modifications that allow us to significantly enhance 3D reconstruction quali…
▽ More
We propose a novel approach for 3D mesh reconstruction from multi-view images. Our method takes inspiration from large reconstruction models like LRM that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images. However, in our method, we introduce several important modifications that allow us to significantly enhance 3D reconstruction quality. First of all, we examine the original LRM architecture and find several shortcomings. Subsequently, we introduce respective modifications to the LRM architecture, which lead to improved multi-view image representation and more computationally efficient training. Second, in order to improve geometry reconstruction and enable supervision at full image resolution, we extract meshes from the NeRF field in a differentiable manner and fine-tune the NeRF model through mesh rendering. These modifications allow us to achieve state-of-the-art performance on both 2D and 3D evaluation metrics, such as a PSNR of 28.67 on Google Scanned Objects (GSO) dataset. Despite these superior results, our feed-forward model still struggles to reconstruct complex textures, such as text and portraits on assets. To address this, we introduce a lightweight per-instance texture refinement procedure. This procedure fine-tunes the triplane representation and the NeRF color estimation model on the mesh surface using the input multi-view images in just 4 seconds. This refinement improves the PSNR to 29.79 and achieves faithful reconstruction of complex textures, such as text. Additionally, our approach enables various downstream applications, including text- or image-to-3D generation.
△ Less
Submitted 13 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
The Invertibility of Cellular Automata with Menory: Correcting Errors and New Conclusions
Authors:
Chen Wang,
Xiang Deng,
Chao Wang
Abstract:
Cellular automata with memory (CAM) are widely used in fields such as image processing, pattern recognition, simulation, and cryptography. The invertibility of CAM is generally considered to be chaotic. Paper [Invertible behavior in elementary cellular automata with memory, Juan C. Seck-Tuoh-Mora et al., Information Sciences, 2012] presented necessary and sufficient conditions for the invertibilit…
▽ More
Cellular automata with memory (CAM) are widely used in fields such as image processing, pattern recognition, simulation, and cryptography. The invertibility of CAM is generally considered to be chaotic. Paper [Invertible behavior in elementary cellular automata with memory, Juan C. Seck-Tuoh-Mora et al., Information Sciences, 2012] presented necessary and sufficient conditions for the invertibility of elementary CAM, but it contains a critical error: it classifies identity CAM as non-invertible, whereas identity CAM is undoubtedly invertible. By integrating Amoroso's algorithm and cycle graphs, we provide the correct necessary and sufficient conditions for the invertibility of one-dimensional CAM. Additionally, we link CAM to a specific type of cellular automaton that is isomorphic to CAM, behaves identically, and has easily determinable invertibility. This makes it a promising alternative tool for CAM applications.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
RAG-Enhanced Commit Message Generation
Authors:
Linghao Zhang,
Hongyi Zhang,
Chong Wang,
Peng Liang
Abstract:
Commit message is one of the most important textual information in software development and maintenance. However, it is time-consuming and labor-intensive to write commit messages manually. Commit Message Generation (CMG) has become a research hotspot in automated software engineering. Researchers have proposed several methods for CMG and achieved great results. In recent years, CodeBERT, CodeT5,…
▽ More
Commit message is one of the most important textual information in software development and maintenance. However, it is time-consuming and labor-intensive to write commit messages manually. Commit Message Generation (CMG) has become a research hotspot in automated software engineering. Researchers have proposed several methods for CMG and achieved great results. In recent years, CodeBERT, CodeT5, and other Pre-trained Language Models (PLMs) for code have been proposed. These models can be easily transferred to code-related downstream tasks including CMG with simple fine-tuning and can achieve impressive performance. Moreover, Large Language Models (LLMs) with code capabilities (e.g., ChatGPT, Llama 3, Gemma) can be directly applied to various tasks by designing instruct prompts without training. This brings new possibilities to the CMG task. In this work, we propose REACT, a novel REtrieval-Augmented framework for CommiT message generation, which effectively integrates advanced retrieval techniques with different PLMs and LLMs and can broadly enhance the performance of various models on the CMG task. Specifically, we design and build a hybrid retriever to retrieve the most relevant code diff and commit message pair from the code base as an "exemplar". Then, the retrieved pair is utilized to guide and enhance the generation of commit messages by PLMs and LLMs through fine-tuning and in-context learning. Our approach is evaluated on a widely-used dataset. The experimental results show that REACT significantly enhances the performance of various models on the CMG task, improving the BLEU score of CodeT5 by up to 55%, boosting Llama 3's BLEU score by 102%, and substantially surpassing all baselines, achieving a new SOTA. This demonstrates the effectiveness and broad applicability of our framework that can enhance CMG by a large margin.
△ Less
Submitted 14 June, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
Stabilizing period-doubled density waves by spin-orbit coupling in Bose-Einstein condensates in optical lattices
Authors:
Chenhui Wang,
Yong** Zhang
Abstract:
In atomic Bose-Einstein condensates in optical lattices, mean-field energy can support the existence of period-doubled density waves, which are similar to Bloch waves but have the double periodicity of the underlying lattice potentials. However, they are dynamically unstable. Here, we propose to use the spin-orbit coupling to stabilize the period-doubled density waves. The stabilization mechanism…
▽ More
In atomic Bose-Einstein condensates in optical lattices, mean-field energy can support the existence of period-doubled density waves, which are similar to Bloch waves but have the double periodicity of the underlying lattice potentials. However, they are dynamically unstable. Here, we propose to use the spin-orbit coupling to stabilize the period-doubled density waves. The stabilization mechanism is revealed to relate to interaction-induced spontaneous symmetry breaking of the spin-flip parity symmetry.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Quantum erasure based on phase structure
Authors:
Ye Yang,
Chengyuan Wang,
Yun Chen,
Jianyi Xv,
Xin Yang,
**wen Wang,
Shuwei Qiu,
Hong Gao,
Fuli Li
Abstract:
The quantum eraser effect exemplifies the distinct properties of quantum mechanics that challenge classical intuition and expose the wave-particle duality of light. This effect has been extensively explored in various experiments; most of these investigations use polarisation to distinguish which path information, and less attention has been paid to the phase structure which is related wavefront o…
▽ More
The quantum eraser effect exemplifies the distinct properties of quantum mechanics that challenge classical intuition and expose the wave-particle duality of light. This effect has been extensively explored in various experiments; most of these investigations use polarisation to distinguish which path information, and less attention has been paid to the phase structure which is related wavefront of photon. In this study, we introduce a theoretical framework for quantum erasure that focusses on the phase structure and demonstrate it experimentally. In this experiment, we employ a Mach-Zehnder interferometer (MZI) where a first-order spiral phase plate (SPP) is integrated into one of its arms. This setup applied orbital angular momentum (OAM) to the photons and established predetermined which-way information. Consequently, the photon demonstrates its particle characteristics, with absence of interference at the MZI's output ports. Utilizing an additional SPP to erase the phase structure from the output photon results in pronounced interference patterns, observable in a post-measurement scenario. This result allows us to include the structure information of the equiphase plane of the light field in quantum erasure. The results challenge the traditional cause-effect relationship in classical physics, given that the subsequent choice of the SPP adheres to a space-like separation.
△ Less
Submitted 18 May, 2024;
originally announced June 2024.
-
LenslessFace: An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification
Authors:
Xin Cai,
Hailong Zhang,
Chenchen Wang,
Wentao Liu,
**wei Gu,
Tianfan Xue
Abstract:
Lensless cameras, innovatively replacing traditional lenses for ultra-thin, flat optics, encode light directly onto sensors, producing images that are not immediately recognizable. This compact, lightweight, and cost-effective imaging solution offers inherent privacy advantages, making it attractive for privacy-sensitive applications like face verification. Typical lensless face verification adopt…
▽ More
Lensless cameras, innovatively replacing traditional lenses for ultra-thin, flat optics, encode light directly onto sensors, producing images that are not immediately recognizable. This compact, lightweight, and cost-effective imaging solution offers inherent privacy advantages, making it attractive for privacy-sensitive applications like face verification. Typical lensless face verification adopts a two-stage process of reconstruction followed by verification, incurring privacy risks from reconstructed faces and high computational costs. This paper presents an end-to-end optimization approach for privacy-preserving face verification directly on encoded lensless captures, ensuring that the entire software pipeline remains encoded with no visible faces as intermediate results. To achieve this, we propose several techniques to address unique challenges from the lensless setup which precludes traditional face detection and alignment. Specifically, we propose a face center alignment scheme, an augmentation curriculum to build robustness against variations, and a knowledge distillation method to smooth optimization and enhance performance. Evaluations under both simulation and real environment demonstrate our method outperforms two-stage lensless verification while enhancing privacy and efficiency. Project website: \url{lenslessface.github.io}.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
On the Scalar Curvature Compactness Conjecture in the Conformal Case
Authors:
Brian Allen,
Wenchuan Tian,
Changliang Wang
Abstract:
Is a sequence of Riemannian manifolds with positive scalar curvature, satisfying some conditions to keep the sequence reasonable, compact? What topology should one use for the convergence and what is the regularity of the limit space? In this paper we explore these questions by studying the case of a sequence of Riemannian manifolds which are conformal to the $n$-dimensional round sphere. We are a…
▽ More
Is a sequence of Riemannian manifolds with positive scalar curvature, satisfying some conditions to keep the sequence reasonable, compact? What topology should one use for the convergence and what is the regularity of the limit space? In this paper we explore these questions by studying the case of a sequence of Riemannian manifolds which are conformal to the $n$-dimensional round sphere. We are able to show that the sequence of conformal factors are compact in several analytic senses and are able to establish $C^0$ convergence away from a singular set of small volume in a similar fashion as C. Dong. Under a bound on the total scalar curvature we are able to show that the limit conformal factor has weak positive scalar curvature in the sense of weakly solving the conformal positive scalar curvature equation.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Latent Neural Operator for Solving Forward and Inverse PDE Problems
Authors:
Tian Wang,
Chuang Wang
Abstract:
Neural operators effectively solve PDE problems from data without knowing the explicit equations, which learn the map from the input sequences of observed samples to the predicted values. Most existed works build the model in the original geometric space, leading to high computational costs when the number of sample points is large. We present the Latent Neural Operator (LNO) solving PDEs in the l…
▽ More
Neural operators effectively solve PDE problems from data without knowing the explicit equations, which learn the map from the input sequences of observed samples to the predicted values. Most existed works build the model in the original geometric space, leading to high computational costs when the number of sample points is large. We present the Latent Neural Operator (LNO) solving PDEs in the latent space. In particular, we first propose Physics-Cross-Attention (PhCA) transforming representation from the geometric space to the latent space, then learn the operator in the latent space, and finally recover the real-world geometric space via the inverse PhCA map. Our model retains flexibility that can decode values in any position not limited to locations defined in training set, and therefore can naturally perform interpolation and extrapolation tasks particularly useful for inverse problems. Moreover, the proposed LNO improves in both prediction accuracy and computational efficiency. Experiments show that LNO reduces the GPU memory by 50%, speeds up training 1.8 times, and reaches state-of-the-art accuracy on four out of six benchmarks for forward problems and a benchmark for inverse problem.
△ Less
Submitted 9 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
BLSP-Emo: Towards Empathetic Large Speech-Language Models
Authors:
Chen Wang,
Minpeng Liao,
Zhongqiang Huang,
Junhong Wu,
Chengqing Zong,
Jiajun Zhang
Abstract:
The recent release of GPT-4o showcased the potential of end-to-end multimodal models, not just in terms of low latency but also in their ability to understand and generate expressive speech with rich emotions. While the details are unknown to the open research community, it likely involves significant amounts of curated data and compute, neither of which is readily accessible. In this paper, we pr…
▽ More
The recent release of GPT-4o showcased the potential of end-to-end multimodal models, not just in terms of low latency but also in their ability to understand and generate expressive speech with rich emotions. While the details are unknown to the open research community, it likely involves significant amounts of curated data and compute, neither of which is readily accessible. In this paper, we present BLSP-Emo (Bootstrapped Language-Speech Pretraining with Emotion support), a novel approach to develo** an end-to-end speech-language model capable of understanding both semantics and emotions in speech and generate empathetic responses. BLSP-Emo utilizes existing speech recognition (ASR) and speech emotion recognition (SER) datasets through a two-stage process. The first stage focuses on semantic alignment, following recent work on pretraining speech-language models using ASR data. The second stage performs emotion alignment with the pretrained speech-language model on an emotion-aware continuation task constructed from SER data. Our experiments demonstrate that the BLSP-Emo model excels in comprehending speech and delivering empathetic responses, both in instruction-following tasks and conversations.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Optical biomarker of metabolism for breast tumor diagnosis: Insights from subcellular dynamics
Authors:
Zichen Yin,
Shuwei Zhang,
Bin He,
Houpu Yang,
Zhengyu Chen,
Zhangwei Hu,
Yejiong Shi,
Ruizhi Xue,
Panqi Yang,
Yuzhe Ying,
Chengming Wang,
Shu Wang,
** Xue
Abstract:
Label-free metabolic dynamics contrast is highly appealing but difficult to achieve in biomedical imaging. Interference offers a highly sensitive mechanism for capturing the metabolic dynamics of the subcellular scatterers. However, traditional interference detection methods fail to isolate pure metabolic dynamics, as the dynamic signals are coupled with scatterer reflectivity and other uncontroll…
▽ More
Label-free metabolic dynamics contrast is highly appealing but difficult to achieve in biomedical imaging. Interference offers a highly sensitive mechanism for capturing the metabolic dynamics of the subcellular scatterers. However, traditional interference detection methods fail to isolate pure metabolic dynamics, as the dynamic signals are coupled with scatterer reflectivity and other uncontrollable imaging factors. Here, we demonstrate active phase modulation-assisted dynamic full-field optical coherence tomography (APMD-FFOCT) that decouples and quantifies the metabolic dynamics by adding a reference movement for all interferential scatterers. This novel technique enables imaging and dynamic analysis of subcellular structures along with their changes during the apoptotic process in tumor tissues. Furthermore, the nucleus-to-cytoplasm dynamic intensity ratio could serve as an optical biomarker for breast tumor grading, enhancing intraoperative diagnosis.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
A second-order accurate, original energy dissipative numerical scheme for chemotaxis and its convergence analysis
Authors:
Jie Ding,
Cheng Wang,
Shenggao Zhou
Abstract:
This paper proposes a second-order accurate numerical scheme for the Patlak-Keller-Segel system with various mobilities for the description of chemotaxis. Formulated in a variational structure, the entropy part is novelly discretized by a modified Crank-Nicolson approach so that the solution to the proposed nonlinear scheme corresponds to a minimizer of a convex functional. A careful theoretical a…
▽ More
This paper proposes a second-order accurate numerical scheme for the Patlak-Keller-Segel system with various mobilities for the description of chemotaxis. Formulated in a variational structure, the entropy part is novelly discretized by a modified Crank-Nicolson approach so that the solution to the proposed nonlinear scheme corresponds to a minimizer of a convex functional. A careful theoretical analysis reveals that the unique solvability and positivity-preserving property could be theoretically justified. More importantly, such a second order numerical scheme is able to preserve the dissipative property of the original energy functional, instead of a modified one. To the best of our knowledge, the proposed scheme is the first second-order accurate one in literature that could achieve both the numerical positivity and original energy dissipation. In addition, an optimal rate convergence estimate is provided for the proposed scheme, in which rough and refined error estimate techniques have to be included to accomplish such an analysis. Ample numerical results are presented to demonstrate robust performance of the proposed scheme in preserving positivity and original energy dissipation in blowup simulations.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Knowledge-Infused Legal Wisdom: Navigating LLM Consultation through the Lens of Diagnostics and Positive-Unlabeled Reinforcement Learning
Authors:
Yang Wu,
Chenghao Wang,
Ece Gumusel,
Xiaozhong Liu
Abstract:
The integration of generative Large Language Models (LLMs) into various applications, including the legal domain, has been accelerated by their expansive and versatile nature. However, when facing a legal case, users without a legal background often struggle to formulate professional queries and may inadvertently overlook critical legal factors when presenting their case narrative to LLMs. To addr…
▽ More
The integration of generative Large Language Models (LLMs) into various applications, including the legal domain, has been accelerated by their expansive and versatile nature. However, when facing a legal case, users without a legal background often struggle to formulate professional queries and may inadvertently overlook critical legal factors when presenting their case narrative to LLMs. To address this issue, we propose the Diagnostic Legal Large Language Model (D3LM), which utilizes adaptive lawyer-like diagnostic questions to collect additional case information and then provides high-quality feedback. D3LM incorporates an innovative graph-based Positive-Unlabeled Reinforcement Learning (PURL) algorithm, enabling the generation of critical questions and enhancing user-LLM interactions. Moreover, an integrated LLM-based stop** criterion facilitates precise Court Views Generation (CVG). Our research also introduces a new English-language CVG dataset based on the US case law database, enriching the realm of LLM research and deployment with a vital dimension. D3LM surpasses classical LLMs by delivering outstanding performance and a remarkable user experience in the legal domain.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
A Multivariate Equivalence Test Based on Mahalanobis Distance with a Data-Driven Margin
Authors:
Chao Wang,
Yu-Ting Weng,
Shaobo Liu,
Tengfei Li,
Meiyu Shen,
Yi Tsong
Abstract:
Multivariate equivalence testing is needed in a variety of scenarios for drug development. For example, drug products obtained from natural sources may contain many components for which the individual effects and/or their interactions on clinical efficacy and safety cannot be completely characterized. Such lack of sufficient characterization poses a challenge for both generic drug developers to de…
▽ More
Multivariate equivalence testing is needed in a variety of scenarios for drug development. For example, drug products obtained from natural sources may contain many components for which the individual effects and/or their interactions on clinical efficacy and safety cannot be completely characterized. Such lack of sufficient characterization poses a challenge for both generic drug developers to demonstrate and regulatory authorities to determine the sameness of a proposed generic product to its reference product. Another case is to ensure batch-to-batch consistency of naturally derived products containing a vast number of components, such as botanical products. The equivalence or sameness between products containing many components that cannot be individually evaluated needs to be studied in a holistic manner. Multivariate equivalence test based on Mahalanobis distance may be suitable to evaluate many variables holistically. Existing studies based on such method assumed either a predetermined constant margin, for which a consensus is difficult to achieve, or a margin derived from the data, where, however, the randomness is ignored during the testing. In this study, we propose a multivariate equivalence test based on Mahalanobis distance with a data-drive margin with the randomness in the margin considered. Several possible implementations are compared with existing approaches via extensive simulation studies.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Exceptional Fano varieties with small minimal log discrepancy
Authors:
Louis Esser,
Jihao Liu,
Chengxi Wang
Abstract:
We construct exceptional Fano varieties with the smallest known minimal log discrepancies in all dimensions. These varieties are well-formed hypersurfaces in weighted projective space. Their minimal log discrepancies decay doubly exponentially with dimension, and achieve the optimal value in dimension 2.
We construct exceptional Fano varieties with the smallest known minimal log discrepancies in all dimensions. These varieties are well-formed hypersurfaces in weighted projective space. Their minimal log discrepancies decay doubly exponentially with dimension, and achieve the optimal value in dimension 2.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Joint Association, Beamforming, and Resource Allocation for Multi-IRS Enabled MU-MISO Systems With RSMA
Authors:
Chunjie Wang,
Xuhui Zhang,
Huijun Xing,
Liang Xue,
Shuqiang Wang,
Yanyan Shen,
Bo Yang,
** Guan
Abstract:
Intelligent reflecting surface (IRS) and rate-splitting multiple access (RSMA) technologies are at the forefront of enhancing spectrum and energy efficiency in the next generation multi-antenna communication systems. This paper explores a RSMA system with multiple IRSs, and proposes two purpose-driven scheduling schemes, i.e., the exhaustive IRS-aided (EIA) and opportunistic IRS-aided (OIA) scheme…
▽ More
Intelligent reflecting surface (IRS) and rate-splitting multiple access (RSMA) technologies are at the forefront of enhancing spectrum and energy efficiency in the next generation multi-antenna communication systems. This paper explores a RSMA system with multiple IRSs, and proposes two purpose-driven scheduling schemes, i.e., the exhaustive IRS-aided (EIA) and opportunistic IRS-aided (OIA) schemes. The aim is to optimize the system weighted energy efficiency (EE) under the above two schemes, respectively. Specifically, the Dinkelbach, branch and bound, successive convex approximation, and the semidefinite relaxation methods are exploited within the alternating optimization framework to obtain effective solutions to the considered problems. The numerical findings indicate that the EIA scheme exhibits better performance compared to the OIA scheme in diverse scenarios when considering the weighted EE, and the proposed algorithm demonstrates superior performance in comparison to the baseline algorithms.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Measurement of the branching fraction ratios $R(D^{+})$ and $R(D^{*+})$ using muonic $τ$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1063 additional authors not shown)
Abstract:
The branching fraction ratios of $\overline{B}^0\to D^+τ^-\overlineν_τ$ and $\overline{B}^0\to D^{*+}τ^-\overlineν_τ$ decays are measured with respect to their muonic counterparts, using a data sample corresponding to an integrated luminosity of 2.0 fb$^{-1}$ collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 13$ TeV. The reconstructed final states are formed by combining…
▽ More
The branching fraction ratios of $\overline{B}^0\to D^+τ^-\overlineν_τ$ and $\overline{B}^0\to D^{*+}τ^-\overlineν_τ$ decays are measured with respect to their muonic counterparts, using a data sample corresponding to an integrated luminosity of 2.0 fb$^{-1}$ collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 13$ TeV. The reconstructed final states are formed by combining $D^+$ mesons with $τ^-\toμ^-\overlineν_μν_τ$ candidates, where the $D^+$ is reconstructed via the $D^+\to K^-π^+π^+$ decay. The results are
\begin{align*}
R(D^{+}) &= 0.249 \pm 0.043 \pm 0.047,
R(D^{*+}) &= 0.402 \pm 0.081\pm 0.085,
\end{align*}
where the first uncertainties are statistical and the second systematic. The two measurements have a correlation coefficient of $-0.39$ and are compatible with the Standard Model.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection
Authors:
Jiangning Zhang,
Haoyang He,
Zhenye Gan,
Qingdong He,
Yuxuan Cai,
Zhucun Xue,
Yabiao Wang,
Chengjie Wang,
Lei Xie,
Yong Liu
Abstract:
Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across differen…
▽ More
Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across different datasets under the practical multi-class setting. The absence of standardized experimental setups can lead to potential biases in training epochs, resolution, and metric results, resulting in erroneous conclusions. This paper addresses this issue by proposing a comprehensive visual anomaly detection benchmark, \textbf{\textit{ADer}}, which is a modular framework that is highly extensible for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. Additionally, we have open-sourced the GPU-assisted \href{https://pypi.org/project/ADEval}{ADEval} package to address the slow evaluation problem of metrics like time-consuming mAU-PRO on large-scale data, significantly reducing evaluation time by more than \textit{1000-fold}. Through extensive experimental results, we objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection. We hope that \textbf{\textit{ADer}} will become a valuable resource for researchers and practitioners in the field, promoting the development of more robust and generalizable anomaly detection systems. Full codes have been attached in Appendix and open-sourced at \url{https://github.com/zhangzjn/ader}.
△ Less
Submitted 6 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Observation of new charmonium(-like) states in $B^+ \to D^{*\pm} D^{\mp} K^+$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1062 additional authors not shown)
Abstract:
A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contribu…
▽ More
A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contributions from resonances decaying to $D^{\ast-}D^{+}$ and $D^{\ast+}D^{-}$ states linked by $C$ parity. This procedure allows the $C$-parities of resonances in the $D^{\ast\pm}D^{\mp}$ mass spectra to be determined. Four charmonium(-like) states are observed decaying into $D^{\ast\pm}D^{\mp}$: $η_c(3945)$, $h_c(4000)$, $χ_{c1}(4010)$ and $h_c(4300)$, with quantum numbers $J^{PC}$ equal to $0^{-+}$, $1^{+-}$, $1^{++}$ and $1^{+-}$, respectively. At least three of these states have not been observed previously. In addition, the existence of the $T_{\bar{c}\bar{s}0}^{*}(2870)^{0}$ and $T_{\bar{c}\bar{s}1}^{*}(2900)^{0}$ resonances in the $D^-K^+$ mass spectrum, already observed in the $B^+ \to D^+ D^- K^+$ decay, is confirmed in a different production channel.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Decision Boundary-aware Knowledge Consolidation Generates Better Instance-Incremental Learner
Authors:
Qiang Nie,
Weifu Fu,
Yuhuan Lin,
Jialin Li,
Yifeng Zhou,
Yong Liu,
Lei Zhu,
Chengjie Wang
Abstract:
Instance-incremental learning (IIL) focuses on learning continually with data of the same classes. Compared to class-incremental learning (CIL), the IIL is seldom explored because IIL suffers less from catastrophic forgetting (CF). However, besides retaining knowledge, in real-world deployment scenarios where the class space is always predefined, continual and cost-effective model promotion with t…
▽ More
Instance-incremental learning (IIL) focuses on learning continually with data of the same classes. Compared to class-incremental learning (CIL), the IIL is seldom explored because IIL suffers less from catastrophic forgetting (CF). However, besides retaining knowledge, in real-world deployment scenarios where the class space is always predefined, continual and cost-effective model promotion with the potential unavailability of previous data is a more essential demand. Therefore, we first define a new and more practical IIL setting as promoting the model's performance besides resisting CF with only new observations. Two issues have to be tackled in the new IIL setting: 1) the notorious catastrophic forgetting because of no access to old data, and 2) broadening the existing decision boundary to new observations because of concept drift. To tackle these problems, our key insight is to moderately broaden the decision boundary to fail cases while retain old boundary. Hence, we propose a novel decision boundary-aware distillation method with consolidating knowledge to teacher to ease the student learning new knowledge. We also establish the benchmarks on existing datasets Cifar-100 and ImageNet. Notably, extensive experiments demonstrate that the teacher model can be a better incremental learner than the student model, which overturns previous knowledge distillation-based methods treating student as the main role.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Population Transformer: Learning Population-level Representations of Intracranial Activity
Authors:
Geeling Chau,
Christopher Wang,
Sabera Talukder,
Vighnesh Subramaniam,
Saraswati Soedarmadji,
Yisong Yue,
Boris Katz,
Andrei Barbu
Abstract:
We present a self-supervised framework that learns population-level codes for intracranial neural recordings at scale, unlocking the benefits of representation learning for a key neuroscience recording modality. The Population Transformer (PopT) lowers the amount of data required for decoding experiments, while increasing accuracy, even on never-before-seen subjects and tasks. We address two key c…
▽ More
We present a self-supervised framework that learns population-level codes for intracranial neural recordings at scale, unlocking the benefits of representation learning for a key neuroscience recording modality. The Population Transformer (PopT) lowers the amount of data required for decoding experiments, while increasing accuracy, even on never-before-seen subjects and tasks. We address two key challenges in develo** PopT: sparse electrode distribution and varying electrode location across patients. PopT stacks on top of pretrained representations and enhances downstream tasks by enabling learned aggregation of multiple spatially-sparse data channels. Beyond decoding, we interpret the pretrained PopT and fine-tuned models to show how it can be used to provide neuroscience insights learned from massive amounts of data. We release a pretrained PopT to enable off-the-shelf improvements in multi-channel intracranial data decoding and interpretability, and code is available at https://github.com/czlwang/PopulationTransformer.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
A Multi-Technique Study of C2H4 Adsorption on a Model Single-Atom Rh1 Catalyst
Authors:
Chunlei Wang,
Panukorn Sombut,
Lena Puntscher,
Manuel Ulreich,
Jiri Pavelec,
David Rath,
Jan Balajka,
Matthias Meier,
Michael Schmid,
Ulrike Diebold,
Cesare Franchini,
Gareth S. Parkinson
Abstract:
Single-atom catalysts are potentially ideal model systems to investigate structure-function relationships in catalysis, if the active sites can be uniquely determined. In this work, we study the interaction of C2H4 with a model Rh/Fe3O4(001) catalyst that features 2-, 5-, and 6-fold coordinated Rh adatoms, as well as Rh clusters. Using multiple surface-sensitive techniques in combination with calc…
▽ More
Single-atom catalysts are potentially ideal model systems to investigate structure-function relationships in catalysis, if the active sites can be uniquely determined. In this work, we study the interaction of C2H4 with a model Rh/Fe3O4(001) catalyst that features 2-, 5-, and 6-fold coordinated Rh adatoms, as well as Rh clusters. Using multiple surface-sensitive techniques in combination with calculations of density functional theory (DFT), we follow the thermal evolution of the system and disentangle the behavior of the different species. C2H4 adsorption is strongest at the 2-fold coordinated Rh1 with a DFT-determined adsorption energy of -2.26 eV. However, desorption occurs at lower temperatures than expected because the Rh migrates into substitutional sites within the support, where the molecule is more weakly bound. Adsorption at the 5-fold coordinated Rh sites is predicated to -1.49 eV, but the superposition of this signal with that from small Rh clusters and additional heterogeneity leads to a broad C2H4 desorption shoulder in TPD above room temperature.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation
Authors:
Chenxin Li,
Xinyu Liu,
Wuyang Li,
Cheng Wang,
Hengyu Liu,
Yixuan Yuan
Abstract:
U-Net has become a cornerstone in various visual applications such as image segmentation and diffusion probability models. While numerous innovative designs and improvements have been introduced by incorporating transformers or MLPs, the networks are still limited to linearly modeling patterns as well as the deficient interpretability. To address these challenges, our intuition is inspired by the…
▽ More
U-Net has become a cornerstone in various visual applications such as image segmentation and diffusion probability models. While numerous innovative designs and improvements have been introduced by incorporating transformers or MLPs, the networks are still limited to linearly modeling patterns as well as the deficient interpretability. To address these challenges, our intuition is inspired by the impressive results of the Kolmogorov-Arnold Networks (KANs) in terms of accuracy and interpretability, which reshape the neural network learning via the stack of non-linear learnable activation functions derived from the Kolmogorov-Anold representation theorem. Specifically, in this paper, we explore the untapped potential of KANs in improving backbones for vision tasks. We investigate, modify and re-design the established U-Net pipeline by integrating the dedicated KAN layers on the tokenized intermediate representation, termed U-KAN. Rigorous medical image segmentation benchmarks verify the superiority of U-KAN by higher accuracy even with less computation cost. We further delved into the potential of U-KAN as an alternative U-Net noise predictor in diffusion models, demonstrating its applicability in generating task-oriented model architectures. These endeavours unveil valuable insights and sheds light on the prospect that with U-KAN, you can make strong backbone for medical image segmentation and generation. Project page: https://yes-ukan.github.io/
△ Less
Submitted 6 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
SSNet: A Lightweight Multi-Party Computation Scheme for Practical Privacy-Preserving Machine Learning Service in the Cloud
Authors:
Shi** Duan,
Chenghong Wang,
Hongwu Peng,
Yukui Luo,
Wujie Wen,
Caiwen Ding,
Xiaolin Xu
Abstract:
As privacy-preserving becomes a pivotal aspect of deep learning (DL) development, multi-party computation (MPC) has gained prominence for its efficiency and strong security. However, the practice of current MPC frameworks is limited, especially when dealing with large neural networks, exemplified by the prolonged execution time of 25.8 seconds for secure inference on ResNet-152. The primary challe…
▽ More
As privacy-preserving becomes a pivotal aspect of deep learning (DL) development, multi-party computation (MPC) has gained prominence for its efficiency and strong security. However, the practice of current MPC frameworks is limited, especially when dealing with large neural networks, exemplified by the prolonged execution time of 25.8 seconds for secure inference on ResNet-152. The primary challenge lies in the reliance of current MPC approaches on additive secret sharing, which incurs significant communication overhead with non-linear operations such as comparisons. Furthermore, additive sharing suffers from poor scalability on party size. In contrast, the evolving landscape of MPC necessitates accommodating a larger number of compute parties and ensuring robust performance against malicious activities or computational failures.
In light of these challenges, we propose SSNet, which for the first time, employs Shamir's secret sharing (SSS) as the backbone of MPC-based ML framework. We meticulously develop all framework primitives and operations for secure DL models tailored to seamlessly integrate with the SSS scheme. SSNet demonstrates the ability to scale up party numbers straightforwardly and embeds strategies to authenticate the computation correctness without incurring significant performance overhead. Additionally, SSNet introduces masking strategies designed to reduce communication overhead associated with non-linear operations. We conduct comprehensive experimental evaluations on commercial cloud computing infrastructure from Amazon AWS, as well as across diverse prevalent DNN models and datasets. SSNet demonstrates a substantial performance boost, achieving speed-ups ranging from 3x to 14x compared to SOTA MPC frameworks. Moreover, SSNet also represents the first framework that is evaluated on a five-party computation setup, in the context of secure DL inference.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
D-FaST: Cognitive Signal Decoding with Disentangled Frequency-Spatial-Temporal Attention
Authors:
Weiguo Chen,
Changjian Wang,
Kele Xu,
Yuan Yuan,
Yanru Bai,
Dongsong Zhang
Abstract:
Cognitive Language Processing (CLP), situated at the intersection of Natural Language Processing (NLP) and cognitive science, plays a progressively pivotal role in the domains of artificial intelligence, cognitive intelligence, and brain science. Among the essential areas of investigation in CLP, Cognitive Signal Decoding (CSD) has made remarkable achievements, yet there still exist challenges rel…
▽ More
Cognitive Language Processing (CLP), situated at the intersection of Natural Language Processing (NLP) and cognitive science, plays a progressively pivotal role in the domains of artificial intelligence, cognitive intelligence, and brain science. Among the essential areas of investigation in CLP, Cognitive Signal Decoding (CSD) has made remarkable achievements, yet there still exist challenges related to insufficient global dynamic representation capability and deficiencies in multi-domain feature integration. In this paper, we introduce a novel paradigm for CLP referred to as Disentangled Frequency-Spatial-Temporal Attention(D-FaST). Specifically, we present an novel cognitive signal decoder that operates on disentangled frequency-space-time domain attention. This decoder encompasses three key components: frequency domain feature extraction employing multi-view attention, spatial domain feature extraction utilizing dynamic brain connection graph attention, and temporal feature extraction relying on local time sliding window attention. These components are integrated within a novel disentangled framework. Additionally, to encourage advancements in this field, we have created a new CLP dataset, MNRED. Subsequently, we conducted an extensive series of experiments, evaluating D-FaST's performance on MNRED, as well as on publicly available datasets including ZuCo, BCIC IV-2A, and BCIC IV-2B. Our experimental results demonstrate that D-FaST outperforms existing methods significantly on both our datasets and traditional CSD datasets including establishing a state-of-the-art accuracy score 78.72% on MNRED, pushing the accuracy score on ZuCo to 78.35%, accuracy score on BCIC IV-2A to 74.85% and accuracy score on BCIC IV-2B to 76.81%.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation
Authors:
Cong Wang,
Kuan Tian,
Jun Zhang,
Yonghang Guan,
Feng Luo,
Fei Shen,
Zhiwei Jiang,
Qing Gu,
Xiao Han,
Wei Yang
Abstract:
In the field of portrait video generation, the use of single images to generate portrait videos has become increasingly prevalent. A common approach involves leveraging generative models to enhance adapters for controlled generation. However, control signals (e.g., text, audio, reference image, pose, depth map, etc.) can vary in strength. Among these, weaker conditions often struggle to be effecti…
▽ More
In the field of portrait video generation, the use of single images to generate portrait videos has become increasingly prevalent. A common approach involves leveraging generative models to enhance adapters for controlled generation. However, control signals (e.g., text, audio, reference image, pose, depth map, etc.) can vary in strength. Among these, weaker conditions often struggle to be effective due to interference from stronger conditions, posing a challenge in balancing these conditions. In our work on portrait video generation, we identified audio signals as particularly weak, often overshadowed by stronger signals such as facial pose and reference image. However, direct training with weak signals often leads to difficulties in convergence. To address this, we propose V-Express, a simple method that balances different control signals through the progressive training and the conditional dropout operation. Our method gradually enables effective control by weak conditions, thereby achieving generation capabilities that simultaneously take into account the facial pose, reference image, and audio. The experimental results demonstrate that our method can effectively generate portrait videos controlled by audio. Furthermore, a potential solution is provided for the simultaneous and effective use of conditions of varying strengths.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Contextual Optimization under Covariate Shift: A Robust Approach by Intersecting Wasserstein Balls
Authors:
Tianyu Wang,
Ningyuan Chen,
Chun Wang
Abstract:
In contextual optimization, a decision-maker observes historical samples of uncertain variables and associated concurrent covariates, without knowing their joint distribution. Given an additional covariate observation, the goal is to choose a decision that minimizes some operational costs. A prevalent issue here is covariate shift, where the marginal distribution of the new covariate differs from…
▽ More
In contextual optimization, a decision-maker observes historical samples of uncertain variables and associated concurrent covariates, without knowing their joint distribution. Given an additional covariate observation, the goal is to choose a decision that minimizes some operational costs. A prevalent issue here is covariate shift, where the marginal distribution of the new covariate differs from historical samples, leading to decision performance variations with nonparametric or parametric estimators. To address this, we propose a distributionally robust approach that uses an ambiguity set by the intersection of two Wasserstein balls, each centered on typical nonparametric or parametric distribution estimators. Computationally, we establish the tractable reformulation of this distributionally robust optimization problem. Statistically, we provide guarantees for our Wasserstein ball intersection approach under covariate shift by analyzing the measure concentration of the estimators. Furthermore, to reduce computational complexity, we employ a surrogate objective that maintains similar generalization guarantees. Through synthetic and empirical case studies on income prediction and portfolio optimization, we demonstrate the strong empirical performance of our proposed models.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising
Authors:
Chengjie Wang,
Haokun Zhu,
**long Peng,
Yue Wang,
Ran Yi,
Yunsheng Wu,
Lizhuang Ma,
Jiangning Zhang
Abstract:
Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. Yet, both RGB and 3D data are crucial for anomaly detection, and the datasets are seldom completely clean in practical scenarios. To address above challenges, this paper initially delves into the RGB-3D multi-modal noisy anomaly detection, proposing a novel noise-resistant M3DM-NR…
▽ More
Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. Yet, both RGB and 3D data are crucial for anomaly detection, and the datasets are seldom completely clean in practical scenarios. To address above challenges, this paper initially delves into the RGB-3D multi-modal noisy anomaly detection, proposing a novel noise-resistant M3DM-NR framework to leveraging strong multi-modal discriminative capabilities of CLIP. M3DM-NR consists of three stages: Stage-I introduces the Suspected References Selection module to filter a few normal samples from the training dataset, using the multimodal features extracted by the Initial Feature Extraction, and a Suspected Anomaly Map Computation module to generate a suspected anomaly map to focus on abnormal regions as reference. Stage-II uses the suspected anomaly maps of the reference samples as reference, and inputs image, point cloud, and text information to achieve denoising of the training samples through intra-modal comparison and multi-scale aggregation operations. Finally, Stage-III proposes the Point Feature Alignment, Unsupervised Feature Fusion, Noise Discriminative Coreset Selection, and Decision Layer Fusion modules to learn the pattern of the training dataset, enabling anomaly detection and segmentation while filtering out noise. Extensive experiments show that M3DM-NR outperforms state-of-the-art methods in 3D-RGB multi-modal noisy anomaly detection.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Optimizing Air-borne Network-in-a-box Deployment for Efficient Remote Coverage
Authors:
Sidrah Javed,
Yunfei Chen,
Mohamed-Slim Alouini,
Cheng-Xiang Wang
Abstract:
Among many envisaged drivers for the sixth generation, one is from the United Nations Sustainability Development Goals 2030 to eliminate digital inequality. Remote coverage in sparsely populated areas, difficult terrains, or emergency scenarios requires on-demand access and flexible deployment with minimal capex and opex. In this context, network-in-a-box (NIB) is an exciting solution that packs t…
▽ More
Among many envisaged drivers for the sixth generation, one is from the United Nations Sustainability Development Goals 2030 to eliminate digital inequality. Remote coverage in sparsely populated areas, difficult terrains, or emergency scenarios requires on-demand access and flexible deployment with minimal capex and opex. In this context, network-in-a-box (NIB) is an exciting solution that packs the whole wireless network into a single portable and re-configurable box to support multiple access technologies such as WiFi/2G/3G/4G/5G etc. In this paper, we propose low-altitude platform stations (LAPS) based NIBs with stratospheric high-altitude platform station (HAPS) as backhaul. Specifically, backhaul employs non-orthogonal multiple access (NOMA) with superposition coding at the transmitting HAPS and successive interference cancellation (SIC) at the receiving NIBs, whereas the access link (AL) employs superposition coding along with the regularized zero-forcing (RZF) precoding at the NIB in order to elevate the computational overhead from the ground users. The required number of airborne NIBs to serve a desired coverage area, their optimal placement, user association, beam optimization, and resource allocation are optimized by maximizing the sum rate of the AL while maintaining the quality of service. Our findings reveal the significance of thorough system planning and communication parameters optimization for enhanced system performance and best coverage under limited resources.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Diver: Large Language Model Decoding with Span-Level Mutual Information Verification
Authors:
**liang Lu,
Chen Wang,
Jiajun Zhang
Abstract:
Large language models (LLMs) have shown impressive capabilities in adapting to various tasks when provided with task-specific instructions. However, LLMs using standard decoding strategies often struggle with deviations from the inputs. Intuitively, compliant LLM outputs should reflect the information present in the input, which can be measured by point-wise mutual information (PMI) scores. Theref…
▽ More
Large language models (LLMs) have shown impressive capabilities in adapting to various tasks when provided with task-specific instructions. However, LLMs using standard decoding strategies often struggle with deviations from the inputs. Intuitively, compliant LLM outputs should reflect the information present in the input, which can be measured by point-wise mutual information (PMI) scores. Therefore, we propose Diver, a novel approach that enhances LLM Decoding through span-level PMI verification. During inference, Diver first identifies divergence steps that may lead to multiple candidate spans. Subsequently, it calculates the PMI scores by assessing the log-likelihood gains of the input if the candidate spans are generated. Finally, the optimal span is selected based on the PMI re-ranked output distributions. We evaluate our method across various downstream tasks, and empirical results demonstrate that Diver significantly outperforms existing decoding methods in both performance and versatility.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Blow-up of solutions to semilinear wave equations with spatial derivatives
Authors:
Kerun Shao,
Hiroyuki Takamura,
Chengbo Wang
Abstract:
For small-amplitude semilinear wave equations with power type nonlinearity on the first-order spatial derivative, the expected sharp upper bound on the lifespan of solutions is obtained for both critical cases and subcritical cases, for all spatial dimensions $n>1$. It is achieved uniformly by constructing the integral equations, deriving the ordinary differential inequality system, and iteration…
▽ More
For small-amplitude semilinear wave equations with power type nonlinearity on the first-order spatial derivative, the expected sharp upper bound on the lifespan of solutions is obtained for both critical cases and subcritical cases, for all spatial dimensions $n>1$. It is achieved uniformly by constructing the integral equations, deriving the ordinary differential inequality system, and iteration argument. Combined with the former works, the sharp lifespan estimates for this problem are completely established, at least for the spherical symmetric case.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Unified one-parameter scaling function for Anderson localization transitions in non-reciprocal non-Hermitian systems
Authors:
C. Wang,
Wenxue He,
X. R. Wang,
Hechen Ren
Abstract:
By using dimensionless conductances as scaling variables, the conventional one-parameter scaling theory of localization fails for non-reciprocal non-Hermitian systems such as the Hanato-Nelson model. Here, we propose a one-parameter scaling function using the participation ratio as the scaling variable. Employing a highly accurate numerical procedure based on exact diagonalization, we demonstrate…
▽ More
By using dimensionless conductances as scaling variables, the conventional one-parameter scaling theory of localization fails for non-reciprocal non-Hermitian systems such as the Hanato-Nelson model. Here, we propose a one-parameter scaling function using the participation ratio as the scaling variable. Employing a highly accurate numerical procedure based on exact diagonalization, we demonstrate that this one-parameter scaling function can describe Anderson localization transitions of non-reciprocal non-Hermitian systems in one and two dimensions of symmetry classes AI and A. The critical exponents of correlation lengths depend on symmetries and dimensionality only, a typical feature of universality. Moreover, we derive a complex-gap equation based on the self-consistent Born approximation that can determine the disorder at which the point gap closes. The obtained disorders match perfectly the critical disorders of Anderson localization transitions from the one-parameter scaling function. Finally, we show that the one-parameter scaling function is also valid for Anderson localization transitions in reciprocal non-Hermitian systems such as two-dimensional class AII$^\dagger$ and can, thus, serve as a unified scaling function for disordered non-Hermitian systems.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
History-Aware Planning for Risk-free Autonomous Navigation on Unknown Uneven Terrain
Authors:
Yinchuan Wang,
Nianfei Du,
Yongsen Qin,
Xiang Zhang,
Rui Song,
Chaoqun Wang
Abstract:
It is challenging for the mobile robot to achieve autonomous and mapless navigation in the unknown environment with uneven terrain. In this study, we present a layered and systematic pipeline. At the local level, we maintain a tree structure that is dynamically extended with the navigation. This structure unifies the planning with the terrain identification. Besides, it contributes to explicitly i…
▽ More
It is challenging for the mobile robot to achieve autonomous and mapless navigation in the unknown environment with uneven terrain. In this study, we present a layered and systematic pipeline. At the local level, we maintain a tree structure that is dynamically extended with the navigation. This structure unifies the planning with the terrain identification. Besides, it contributes to explicitly identifying the hazardous areas on uneven terrain. In particular, certain nodes of the tree are consistently kept to form a sparse graph at the global level, which records the history of the exploration. A series of subgoals that can be obtained in the tree and the graph are utilized for leading the navigation. To determine a subgoal, we develop an evaluation method whose input elements can be efficiently obtained on the layered structure. We conduct both simulation and real-world experiments to evaluate the developed method and its key modules. The experimental results demonstrate the effectiveness and efficiency of our method. The robot can travel through the unknown uneven region safely and reach the target rapidly without a preconstructed map.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
Authors:
Yu Zhang,
Qi Zhang,
Zixuan Gong,
Yiwei Shi,
Yepeng Liu,
Duoqian Miao,
Yang Liu,
Ke Liu,
Kun Yi,
Wei Fan,
Liang Hu,
Changwei Wang
Abstract:
Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success, leading to rapid advancements in multimodal studies. However, CLIP faces a notable challenge in terms of inefficient data utilization. It relies on a single contrastive supervision for each image-text pair during representation learning, disregarding a substantial amount of valuable information that could offer richer s…
▽ More
Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success, leading to rapid advancements in multimodal studies. However, CLIP faces a notable challenge in terms of inefficient data utilization. It relies on a single contrastive supervision for each image-text pair during representation learning, disregarding a substantial amount of valuable information that could offer richer supervision. Additionally, the retention of non-informative tokens leads to increased computational demands and time costs, particularly in CLIP's ViT image encoder. To address these issues, we propose Multi-Perspective Language-Image Pretraining (MLIP). In MLIP, we leverage the frequency transform's sensitivity to both high and low-frequency variations, which complements the spatial domain's sensitivity limited to low-frequency variations only. By incorporating frequency transforms and token-level alignment, we expand CILP's single supervision into multi-domain and multi-level supervision, enabling a more thorough exploration of informative image features. Additionally, we introduce a token merging method guided by comprehensive semantics from the frequency and spatial domains. This allows us to merge tokens to multi-granularity tokens with a controllable compression rate to accelerate CLIP. Extensive experiments validate the effectiveness of our design.
△ Less
Submitted 4 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
An absence of binary companions to Wolf-Rayet stars in the Small Magellanic Cloud: implications for mass loss and black hole masses at low metallicity
Authors:
A. Schootemeijer,
T. Shenar,
N. Langer,
N. Grin,
H. Sana,
G. Gräfener C. Schürmann,
C. Wang,
X. -T. Xu
Abstract:
In order to predict the black hole mass distributions at high redshift, we need to understand whether very massive single stars ($M>40$ M$_\odot$) at low metallicity $Z$ lose their hydrogen-rich envelopes, like their metal-rich counterparts, or whether a binary companion is required to achieve this. To test this, we undertake a deep spectroscopic search for binary companions of the seven apparentl…
▽ More
In order to predict the black hole mass distributions at high redshift, we need to understand whether very massive single stars ($M>40$ M$_\odot$) at low metallicity $Z$ lose their hydrogen-rich envelopes, like their metal-rich counterparts, or whether a binary companion is required to achieve this. To test this, we undertake a deep spectroscopic search for binary companions of the seven apparently single Wolf-Rayet (WR) stars in the Small Magellanic Cloud (SMC; $Z \simeq 1/5 Z_\odot$). For each of them, we acquired six high-quality VLT-UVES spectra spread over 1.5 years. By using the narrow N V lines in these spectra, we monitor radial velocity (RV) variations to search for binary motion. We find low RV variations between 6 and 23 km/s for the seven WR stars, with a median standard deviation of $5$ km/s. Our Monte Carlo simulations imply probabilities below ~5% for any of our target WR stars to have a binary companion more massive than ~5 M$_\odot$ at orbital periods of less than a year. We estimate that the probability that all our target WR stars have companions with orbital periods shorter than 10 yr is below ~10$^{-5}$, and argue that the observed modest RV variations may originate from intrinsic atmosphere or wind variability. Our findings imply that metal-poor massive stars born with $M \gtrsim 40$ M$_\odot$ can lose most of their hydrogen-rich envelopes via stellar winds or eruptive mass loss, which strongly constrains their initial mass - black hole mass relation. We also identify two of our seven target stars (SMC AB1 and SMC AB11) as runaway stars with a peculiar radial velocity of ~80 km/s. Moreover, with all five previously detected WR binaries in the SMC exhibiting orbital periods of below 20 d, a puzzling absence of intermediate-to-long-period WR binaries has emerged, with strong implications for the outcome of massive binary interaction at low metallicity.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Statistics-Informed Parameterized Quantum Circuit via Maximum Entropy Principle for Data Science and Finance
Authors:
Xi-Ning Zhuang,
Zhao-Yun Chen,
Cheng Xue,
Xiao-Fan Xu,
Chao Wang,
Huan-Yu Liu,
Tai-** Sun,
Yun-Jie Wang,
Yu-Chun Wu,
Guo-** Guo
Abstract:
Quantum machine learning has demonstrated significant potential in solving practical problems, particularly in statistics-focused areas such as data science and finance. However, challenges remain in preparing and learning statistical models on a quantum processor due to issues with trainability and interpretability. In this letter, we utilize the maximum entropy principle to design a statistics-i…
▽ More
Quantum machine learning has demonstrated significant potential in solving practical problems, particularly in statistics-focused areas such as data science and finance. However, challenges remain in preparing and learning statistical models on a quantum processor due to issues with trainability and interpretability. In this letter, we utilize the maximum entropy principle to design a statistics-informed parameterized quantum circuit (SI-PQC) for efficiently preparing and training of quantum computational statistical models, including arbitrary distributions and their weighted mixtures. The SI-PQC features a static structure with trainable parameters, enabling in-depth optimized circuit compilation, exponential reductions in resource and time consumption, and improved trainability and interpretability for learning quantum states and classical model parameters simultaneously. As an efficient subroutine for preparing and learning in various quantum algorithms, the SI-PQC addresses the input bottleneck and facilitates the injection of prior knowledge.
△ Less
Submitted 18 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
A 0.96pJ/SOP, 30.23K-neuron/mm^2 Heterogeneous Neuromorphic Chip With Fullerene-like Interconnection Topology for Edge-AI Computing
Authors:
P. J. Zhou,
Q. Yu,
M. Chen,
Y. C. Wang,
L. W. Meng,
Y. Zuo,
N. Ning,
Y. Liu,
S. G. Hu,
G. C. Qiao
Abstract:
Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.96 pJ/SOP heterogeneous neuromorphic system-on-chip (SoC) with fullerene-like interconnection topology for edge-AI computing. The neuromorphic core integrates different technologies to augment computing energy efficiency,…
▽ More
Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.96 pJ/SOP heterogeneous neuromorphic system-on-chip (SoC) with fullerene-like interconnection topology for edge-AI computing. The neuromorphic core integrates different technologies to augment computing energy efficiency, including sparse computing, partial membrane potential updates, and non-uniform weight quantization. Multiple neuromorphic cores and multi-mode routers form a fullerene-like network-on-chip (NoC). The average degree of communication nodes exceeds traditional topologies by 32%, with a minimal degree variance of 0.93, allowing advanced decentralized on-chip communication. Additionally, the NoC can be scaled up through extended off-chip high-level router nodes. A RISC-V CPU and a neuromorphic processor are tightly coupled and fabricated within a 5.42 mm^2 die area under 55 nm CMOS technology. The chip has a low power density of 0.52 mW/mm^2, reducing 67.5% compared to related works, and achieves a high neuron density of 30.23 K/mm^2. Eventually, the chip is demonstrated to be effective on different datasets and achieves 0.96 pJ/SOP energy efficiency.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Latent Logic Tree Extraction for Event Sequence Explanation from LLMs
Authors:
Zitao Song,
Chao Yang,
Chaojie Wang,
Bo An,
Shuang Li
Abstract:
Modern high-stakes systems, such as healthcare or robotics, often generate vast streaming event sequences. Our goal is to design an efficient, plug-and-play tool to elicit logic tree-based explanations from Large Language Models (LLMs) to provide customized insights into each observed event sequence. Built on the temporal point process model for events, our method employs the likelihood function a…
▽ More
Modern high-stakes systems, such as healthcare or robotics, often generate vast streaming event sequences. Our goal is to design an efficient, plug-and-play tool to elicit logic tree-based explanations from Large Language Models (LLMs) to provide customized insights into each observed event sequence. Built on the temporal point process model for events, our method employs the likelihood function as a score to evaluate generated logic trees. We propose an amortized Expectation-Maximization (EM) learning framework and treat the logic tree as latent variables. In the E-step, we evaluate the posterior distribution over the latent logic trees using an LLM prior and the likelihood of the observed event sequences. LLM provides a high-quality prior for the latent logic trees, however, since the posterior is built over a discrete combinatorial space, we cannot get the closed-form solution. We propose to generate logic tree samples from the posterior using a learnable GFlowNet, which is a diversity-seeking generator for structured discrete variables. The M-step employs the generated logic rules to approximate marginalization over the posterior, facilitating the learning of model parameters and refining the tunable LLM prior parameters. In the online setting, our locally built, lightweight model will iteratively extract the most relevant rules from LLMs for each sequence using only a few iterations. Empirical demonstrations showcase the promising performance and adaptability of our framework.
△ Less
Submitted 28 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.