Search | arXiv e-print repository

doi 10.1103/PhysRevB.107.115144

Emergent spin-glass state in the doped Hund's metal CsFe2As2

Authors: S. J. Li, D. Zhao, S. Wang, S. T. Cui, N. Z. Wang, J. Li, D. W. Song, B. L. Kang, L. X. Zheng, L. P. Nie, Z. M. Wu, Y. B. Zhou, M. Shan, Z. Sun, T. Wu, X. H. Chen

Abstract: Hund's metal is one kind of correlated metal, in which the electronic correlation is strongly influenced by the Hund's interaction. At high temperatures, while the charge and orbital degrees of freedom are quenched, the spin degrees of freedom can persist in terms of frozen moments. As temperature decreases, a coherent electronic state with characteristic orbital differentiation always emerges at… ▽ More Hund's metal is one kind of correlated metal, in which the electronic correlation is strongly influenced by the Hund's interaction. At high temperatures, while the charge and orbital degrees of freedom are quenched, the spin degrees of freedom can persist in terms of frozen moments. As temperature decreases, a coherent electronic state with characteristic orbital differentiation always emerges at low temperatures through an incoherent-to-coherent crossover, which has been widely observed in iron-based superconductors (e.g., iron selenides and AFe2As2 (A = K, Rb, Cs)). Consequently, the above frozen moments are "screened" by coupling to orbital degrees of freedom, leading to an emergent Fermi-liquid state. In contrast, the coupling among frozen moments should impede the formation of the Fermi-liquid state by competitive magnetic ordering, which is still unexplored in Hund's metal. Here, in the iron-based Hund's metal CsFe2As2, we adopt a chemical substitution at iron sites by Cr/Co atoms to explore the competitive magnetic ordering. By a comprehensive study of resistivity, magnetic susceptibility, specific heat and nuclear magnetic resonance, we demonstrate that the Fermi-liquid state is destroyed in Cr-doped CsFe2As2 by a spinfreezing transition below T_g ~ 22 K. Meanwhile, the evolution of charge degrees of freedom measured by angle-resolved photoemission spectroscopy also supports the competition between the Fermi-liquid state and spin-glass state. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 21 pages, 7 figures

Journal ref: Phys. Rev. B 107, 115144 (2023)

arXiv:2310.16358 [pdf, other]

From Simple to Complex: A Progressive Framework for Document-level Informative Argument Extraction

Authors: Quzhe Huang, Yanxi Zhang, Dongyan Zhao

Abstract: Document-level Event Argument Extraction (EAE) requires the model to extract arguments of multiple events from a single document. Considering the underlying dependencies between these events, recent efforts leverage the idea of "memory", where the results of already predicted events are cached and can be retrieved to help the prediction of upcoming events. These methods extract events according to… ▽ More Document-level Event Argument Extraction (EAE) requires the model to extract arguments of multiple events from a single document. Considering the underlying dependencies between these events, recent efforts leverage the idea of "memory", where the results of already predicted events are cached and can be retrieved to help the prediction of upcoming events. These methods extract events according to their appearance order in the document, however, the event that appears in the first sentence does not mean that it is the easiest to extract. Existing methods might introduce noise to the extraction of upcoming events if they rely on an incorrect prediction of previous events. In order to provide more reliable memory, we propose a simple-to-complex progressive framework for document-level EAE. Specifically, we first calculate the difficulty of each event and then, we conduct the extraction following a simple-to-complex order. In this way, the memory will store the most certain results, and the model could use these reliable sources to help the prediction of more difficult events. Experiments on WikiEvents show that our model outperforms SOTA by 1.4% in F1, indicating the proposed simple-to-complex framework is useful in the EAE task. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted to the Findings of EMNLP 2023 (Long Paper)

arXiv:2310.16311 [pdf]

doi 10.1007/s11433-023-2189-7

Magnetic-field-induced electronic instability of Weyl-like fermions in compressed black phosphorus

Authors: Lixuan Zheng, Kaifa Luo, Zeliang Sun, Dan Zhao, Jian Li, Dianwu Song, Shunjiao Li, Baolei Kang, Linpeng Nie, Min Shan, Zhimian Wu, Yanbing Zhou, Xi Dai, Hongming Weng, Rui Yu, Tao Wu, Xianhui Chen

Abstract: Revealing the role of Coulomb interaction in topological semimetals with Dirac/Weyl-like band dispersion shapes a new frontier in condensed matter physics. Topological node-line semimetals (TNLSMs), anticipated as a fertile ground for exploring electronic correlation effects due to the anisotropy associated with their node-line structure, have recently attracted considerable attention. In this stu… ▽ More Revealing the role of Coulomb interaction in topological semimetals with Dirac/Weyl-like band dispersion shapes a new frontier in condensed matter physics. Topological node-line semimetals (TNLSMs), anticipated as a fertile ground for exploring electronic correlation effects due to the anisotropy associated with their node-line structure, have recently attracted considerable attention. In this study, we report an experimental observation for correlation effects in TNLSMs realized by black phosphorus (BP) under hydrostatic pressure. By performing a combination of nuclear magnetic resonance measurements and band calculations on compressed BP, a magnetic-field-induced electronic instability of Weyl-like fermions is identified under an external magnetic field parallel to the so-called nodal ring in the reciprocal space. Anomalous spin fluctuations serving as the fingerprint of electronic instability are observed at low temperatures, and they are observed to maximize at approximately 1.0 GPa. This study presents compressed BP as a realistic material platform for exploring the rich physics in strongly coupled Weyl-like fermions. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 10 pages, 4 figures

Journal ref: Sci. China-Phys. Mech. Astron. 66, 117011 (2023)

arXiv:2310.15594 [pdf, other]

Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression

Authors: Jiduan Liu, Jiahao Liu, Qifan Wang, **gang Wang, Xunliang Cai, Dongyan Zhao, Ran Lucien Wang, Rui Yan

Abstract: Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. While numerous model compression techniques have been proposed, most of them are not well-suited for achieving extreme model compression when ther… ▽ More Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. While numerous model compression techniques have been proposed, most of them are not well-suited for achieving extreme model compression when there is a significant gap in model scale. In this paper, we introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT), which effectively transfers the knowledge of LLMs to extremely small-scale models (e.g., 1%). In particular, our approach extracts knowledge from LLMs to construct a knowledge store, from which the small-scale model can retrieve relevant information and leverage it for effective inference. To improve the quality of the model, soft prompt tuning and Proximal Policy Optimization (PPO) reinforcement learning techniques are employed. Extensive experiments are conducted on low-resource tasks from SuperGLUE and GLUE benchmarks. The results demonstrate that the proposed approach significantly enhances the performance of small-scale models by leveraging the knowledge from LLMs. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Findings

arXiv:2310.14277 [pdf, other]

A Survey on Continual Semantic Segmentation: Theory, Challenge, Method and Application

Authors: Bo Yuan, Danpei Zhao

Abstract: Continual learning, also known as incremental learning or life-long learning, stands at the forefront of deep learning and AI systems. It breaks through the obstacle of one-way training on close sets and enables continuous adaptive learning on open-set conditions. In the recent decade, continual learning has been explored and applied in multiple fields especially in computer vision covering classi… ▽ More Continual learning, also known as incremental learning or life-long learning, stands at the forefront of deep learning and AI systems. It breaks through the obstacle of one-way training on close sets and enables continuous adaptive learning on open-set conditions. In the recent decade, continual learning has been explored and applied in multiple fields especially in computer vision covering classification, detection and segmentation tasks. Continual semantic segmentation (CSS), of which the dense prediction peculiarity makes it a challenging, intricate and burgeoning task. In this paper, we present a review of CSS, committing to building a comprehensive survey on problem formulations, primary challenges, universal datasets, neoteric theories and multifarious applications. Concretely, we begin by elucidating the problem definitions and primary challenges. Based on an in-depth investigation of relevant approaches, we sort out and categorize current CSS models into two main branches including \textit{data-replay} and \textit{data-free} sets. In each branch, the corresponding approaches are similarity-based clustered and thoroughly analyzed, following qualitative comparison and quantitative reproductions on relevant datasets. Besides, we also introduce four CSS specialities with diverse application scenarios and development tendencies. Furthermore, we develop a benchmark for CSS encompassing representative references, evaluation results and reproductions, which is available at~\url{https://github.com/YBIO/SurveyCSS}. We hope this survey can serve as a reference-worthy and stimulating contribution to the advancement of the life-long learning field, while also providing valuable perspectives for related fields. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 20 pages, 12 figures. Undergoing Review

arXiv:2310.13065 [pdf, other]

Creative Robot Tool Use with Large Language Models

Authors: Mengdi Xu, Peide Huang, Wenhao Yu, Shiqi Liu, Xilun Zhang, Yaru Niu, Tingnan Zhang, Fei Xia, Jie Tan, Ding Zhao

Abstract: Tool use is a hallmark of advanced intelligence, exemplified in both animal behavior and robotic capabilities. This paper investigates the feasibility of imbuing robots with the ability to creatively use tools in tasks that involve implicit physical constraints and long-term planning. Leveraging Large Language Models (LLMs), we develop RoboTool, a system that accepts natural language instructions… ▽ More Tool use is a hallmark of advanced intelligence, exemplified in both animal behavior and robotic capabilities. This paper investigates the feasibility of imbuing robots with the ability to creatively use tools in tasks that involve implicit physical constraints and long-term planning. Leveraging Large Language Models (LLMs), we develop RoboTool, a system that accepts natural language instructions and outputs executable code for controlling robots in both simulated and real-world environments. RoboTool incorporates four pivotal components: (i) an "Analyzer" that interprets natural language to discern key task-related concepts, (ii) a "Planner" that generates comprehensive strategies based on the language input and key concepts, (iii) a "Calculator" that computes parameters for each skill, and (iv) a "Coder" that translates these plans into executable Python code. Our results show that RoboTool can not only comprehend explicit or implicit physical constraints and environmental factors but also demonstrate creative tool use. Unlike traditional Task and Motion Planning (TAMP) methods that rely on explicit optimization, our LLM-based system offers a more flexible, efficient, and user-friendly solution for complex robotics tasks. Through extensive experiments, we validate that RoboTool is proficient in handling tasks that would otherwise be infeasible without the creative use of tools, thereby expanding the capabilities of robotic systems. Demos are available on our project page: https://creative-robotool.github.io/. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 19 pages, 14 figures, 2 tables

arXiv:2310.10733 [pdf, other]

From Halos to Galaxies. VII. The Connections Between Stellar Mass Growth History, Quenching History and Halo Assembly History for Central Galaxies

Authors: Cheqiu Lyu, Yingjie Peng, Yipeng **g, Xiaohu Yang, Luis C. Ho, Alvio Renzini, Bitao Wang, Kai Wang, Bingxiao Xu, Dingyi Zhao, **g Dou, Qiusheng Gu, Roberto Maiolino, Filippo Mannucci, Feng Yuan

Abstract: The assembly of galaxies over cosmic time is tightly connected to the assembly of their host dark matter halos. We investigate the stellar mass growth history and the chemical enrichment history of central galaxies in SDSS-MaNGA. We find that the derived stellar metallicity of passive central galaxies is always higher than that of the star-forming ones. This stellar metallicity enhancement becomes… ▽ More The assembly of galaxies over cosmic time is tightly connected to the assembly of their host dark matter halos. We investigate the stellar mass growth history and the chemical enrichment history of central galaxies in SDSS-MaNGA. We find that the derived stellar metallicity of passive central galaxies is always higher than that of the star-forming ones. This stellar metallicity enhancement becomes progressively larger towards low-mass galaxies (at a given epoch) and earlier epochs (at a given stellar mass), which suggests strangulation as the primary mechanism for star formation quenching in central galaxies not only in the local universe, but also very likely at higher redshifts up to $z\sim3$. We show that at the same present-day stellar mass, passive central galaxies assembled half of their final stellar mass $\sim 2$ Gyr earlier than star-forming central galaxies, which agrees well with semi-analytic model. Exploring semi-analytic model, we find that this is because passive central galaxies reside in, on average, more massive halos with a higher halo mass increase rate across cosmic time. As a consequence, passive central galaxies are assembled faster and also quenched earlier than their star-forming counterparts. While at the same present-day halo mass, different halo assembly history also produces very different final stellar mass of the central galaxy within, and halos assembled earlier host more massive centrals with a higher quenched fraction, in particular around the "golden halo mass" at $10^{12}\mathrm{M_\odot}$. Our results call attention back to the dark matter halo as a key driver of galaxy evolution. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 19 pages, 11 figures. Accepted by ApJ

arXiv:2310.10125 [pdf, other]

Few-shot Action Recognition with Captioning Foundation Models

Authors: Xiang Wang, Shiwei Zhang, Hangjie Yuan, Yingya Zhang, Changxin Gao, Deli Zhao, Nong Sang

Abstract: Transferring vision-language knowledge from pretrained multimodal foundation models to various downstream tasks is a promising direction. However, most current few-shot action recognition methods are still limited to a single visual modality input due to the high cost of annotating additional textual descriptions. In this paper, we develop an effective plug-and-play framework called CapFSAR to exp… ▽ More Transferring vision-language knowledge from pretrained multimodal foundation models to various downstream tasks is a promising direction. However, most current few-shot action recognition methods are still limited to a single visual modality input due to the high cost of annotating additional textual descriptions. In this paper, we develop an effective plug-and-play framework called CapFSAR to exploit the knowledge of multimodal models without manually annotating text. To be specific, we first utilize a captioning foundation model (i.e., BLIP) to extract visual features and automatically generate associated captions for input videos. Then, we apply a text encoder to the synthetic captions to obtain representative text embeddings. Finally, a visual-text aggregation module based on Transformer is further designed to incorporate cross-modal spatio-temporal complementary information for reliable few-shot matching. In this way, CapFSAR can benefit from powerful multimodal knowledge of pretrained foundation models, yielding more comprehensive classification in the low-shot regime. Extensive experiments on multiple standard few-shot benchmarks demonstrate that the proposed CapFSAR performs favorably against existing methods and achieves state-of-the-art performance. The code will be made publicly available. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.10033 [pdf, other]

Deep Unfolding Network for Image Compressed Sensing by Content-adaptive Gradient Updating and Deformation-invariant Non-local Modeling

Authors: Wenxue Cui, Xiaopeng Fan, Jian Zhang, Debin Zhao

Abstract: Inspired by certain optimization solvers, the deep unfolding network (DUN) has attracted much attention in recent years for image compressed sensing (CS). However, there still exist the following two issues: 1) In existing DUNs, most hyperparameters are usually content independent, which greatly limits their adaptability for different input contents. 2) In each iteration, a plain convolutional neu… ▽ More Inspired by certain optimization solvers, the deep unfolding network (DUN) has attracted much attention in recent years for image compressed sensing (CS). However, there still exist the following two issues: 1) In existing DUNs, most hyperparameters are usually content independent, which greatly limits their adaptability for different input contents. 2) In each iteration, a plain convolutional neural network is usually adopted, which weakens the perception of wider context prior and therefore depresses the expressive ability. In this paper, inspired by the traditional Proximal Gradient Descent (PGD) algorithm, a novel DUN for image compressed sensing (dubbed DUN-CSNet) is proposed to solve the above two issues. Specifically, for the first issue, a novel content adaptive gradient descent network is proposed, in which a well-designed step size generation sub-network is developed to dynamically allocate the corresponding step sizes for different textures of input image by generating a content-aware step size map, realizing a content-adaptive gradient updating. For the second issue, considering the fact that many similar patches exist in an image but have undergone a deformation, a novel deformation-invariant non-local proximal map** network is developed, which can adaptively build the long-range dependencies between the nonlocal patches by deformation-invariant non-local modeling, leading to a wider perception on context priors. Extensive experiments manifest that the proposed DUN-CSNet outperforms existing state-of-the-art CS methods by large margins. △ Less

Submitted 15 October, 2023; originally announced October 2023.

Comments: 16 pages, 13 figures. Accepted by IEEE Transactions on Multimedia (TMM)

arXiv:2310.09469 [pdf, other]

Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner

Authors: Mengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Ran Yi, Deli Zhao, Wen** Wang, Yong-** Liu

Abstract: A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. Existing acceleration algorithms simplify the sampling by skip** most steps yet exhibit considerable performance degradation. By viewing the generation of diffusion models as a discretized integrating process, we argue that the quality drop is partly caused… ▽ More A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. Existing acceleration algorithms simplify the sampling by skip** most steps yet exhibit considerable performance degradation. By viewing the generation of diffusion models as a discretized integrating process, we argue that the quality drop is partly caused by applying an inaccurate integral direction to a timestep interval. To rectify this issue, we propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost. Specifically, at each denoising step, we replace the original parameterization by conditioning the network on a new timestep, which is obtained by aligning the sampling distribution to the real distribution. Extensive experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods, especially when there are few denoising steps. For example, when using 10 denoising steps on the popular LSUN Bedroom dataset, we improve the FID of DDIM from 9.65 to 6.07, simply by adopting our method for a more appropriate set of timesteps. Code will be made publicly available. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.09457 [pdf, other]

UCM-Net: A Lightweight and Efficient Solution for Skin Lesion Segmentation using MLP and CNN

Authors: Chunyu Yuan, Dongfang Zhao, Sos S. Agaian

Abstract: Skin cancer poses a significant public health challenge, necessitating efficient diagnostic tools. We introduce UCM-Net, a novel skin lesion segmentation model combining Multi-Layer Perceptrons (MLP) and Convolutional Neural Networks (CNN). This lightweight, efficient architecture, deviating from traditional UNet designs, dramatically reduces computational demands, making it ideal for mobile healt… ▽ More Skin cancer poses a significant public health challenge, necessitating efficient diagnostic tools. We introduce UCM-Net, a novel skin lesion segmentation model combining Multi-Layer Perceptrons (MLP) and Convolutional Neural Networks (CNN). This lightweight, efficient architecture, deviating from traditional UNet designs, dramatically reduces computational demands, making it ideal for mobile health applications. Evaluated on PH2, ISIC 2017, and ISIC 2018 datasets, UCM-Net demonstrates robust performance with fewer than 50KB parameters and requires less than 0.05 Giga Operations Per Second (GLOPs). Moreover, its minimal memory requirement is just 1.19MB in CPU environment positions. It is a potential benchmark for efficiency in skin lesion segmentation, suitable for deployment in resource-constrained settings. In order to facilitate accessibility and further research in the field, the UCM-Net source code is https://github.com/chunyuyuan/UCM-Net. △ Less

Submitted 24 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: 17 pages, accepted by Journal of Biomedical Signal Processing and Control

arXiv:2310.07187 [pdf, other]

Kernel Cox partially linear regression: building predictive models for cancer patients' survival

Authors: Yaohua Rong, Sihai Dave Zhao, Xia Zheng, Yi Li

Abstract: Wide heterogeneity exists in cancer patients' survival, ranging from a few months to several decades. To accurately predict clinical outcomes, it is vital to build an accurate predictive model that relates patients' molecular profiles with patients' survival. With complex relationships between survival and high-dimensional molecular predictors, it is challenging to conduct non-parametric modeling… ▽ More Wide heterogeneity exists in cancer patients' survival, ranging from a few months to several decades. To accurately predict clinical outcomes, it is vital to build an accurate predictive model that relates patients' molecular profiles with patients' survival. With complex relationships between survival and high-dimensional molecular predictors, it is challenging to conduct non-parametric modeling and irrelevant predictors removing simultaneously. In this paper, we build a kernel Cox proportional hazards semi-parametric model and propose a novel regularized garrotized kernel machine (RegGKM) method to fit the model. We use the kernel machine method to describe the complex relationship between survival and predictors, while automatically removing irrelevant parametric and non-parametric predictors through a LASSO penalty. An efficient high-dimensional algorithm is developed for the proposed method. Comparison with other competing methods in simulation shows that the proposed method always has better predictive accuracy. We apply this method to analyze a multiple myeloma dataset and predict patients' death burden based on their gene expressions. Our results can help classify patients into groups with different death risks, facilitating treatment for better clinical outcomes. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.06903 [pdf, other]

Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization

Authors: Fan Yang, Wenxuan Zhou, Zuxin Liu, Ding Zhao, David Held

Abstract: Safe Reinforcement Learning (RL) plays an important role in applying RL algorithms to safety-critical real-world applications, addressing the trade-off between maximizing rewards and adhering to safety constraints. This work introduces a novel approach that combines RL with trajectory optimization to manage this trade-off effectively. Our approach embeds safety constraints within the action space… ▽ More Safe Reinforcement Learning (RL) plays an important role in applying RL algorithms to safety-critical real-world applications, addressing the trade-off between maximizing rewards and adhering to safety constraints. This work introduces a novel approach that combines RL with trajectory optimization to manage this trade-off effectively. Our approach embeds safety constraints within the action space of a modified Markov Decision Process (MDP). The RL agent produces a sequence of actions that are transformed into safe trajectories by a trajectory optimizer, thereby effectively ensuring safety and increasing training stability. This novel approach excels in its performance on challenging Safety Gym tasks, achieving significantly higher rewards and near-zero safety violations during inference. The method's real-world applicability is demonstrated through a safe and effective deployment in a real robot task of box-pushing around obstacles. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.06343 [pdf, other]

Boosting Continuous Control with Consistency Policy

Authors: Yuhui Chen, Haoran Li, Dongbin Zhao

Abstract: Due to its training stability and strong expression, the diffusion model has attracted considerable attention in offline reinforcement learning. However, several challenges have also come with it: 1) The demand for a large number of diffusion steps makes the diffusion-model-based methods time inefficient and limits their applications in real-time control; 2) How to achieve policy improvement with… ▽ More Due to its training stability and strong expression, the diffusion model has attracted considerable attention in offline reinforcement learning. However, several challenges have also come with it: 1) The demand for a large number of diffusion steps makes the diffusion-model-based methods time inefficient and limits their applications in real-time control; 2) How to achieve policy improvement with accurate guidance for diffusion model-based policy is still an open problem. Inspired by the consistency model, we propose a novel time-efficiency method named Consistency Policy with Q-Learning (CPQL), which derives action from noise by a single step. By establishing a map** from the reverse diffusion trajectories to the desired policy, we simultaneously address the issues of time efficiency and inaccurate guidance when updating diffusion model-based policy with the learned Q-function. We demonstrate that CPQL can achieve policy improvement with accurate guidance for offline reinforcement learning, and can be seamlessly extended for online RL tasks. Experimental results indicate that CPQL achieves new state-of-the-art performance on 11 offline and 21 online tasks, significantly improving inference speed by nearly 45 times compared to Diffusion-QL. We will release our code later. △ Less

Submitted 23 January, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: 18 pages, 9 pages

arXiv:2310.05905 [pdf, other]

TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models

Authors: Zuxin Liu, Jesse Zhang, Kavosh Asadi, Yao Liu, Ding Zhao, Shoham Sabach, Rasool Fakoor

Abstract: The full potential of large pretrained models remains largely untapped in control domains like robotics. This is mainly because of the scarcity of data and the computational challenges associated with training or fine-tuning these large models for such applications. Prior work mainly emphasizes either effective pretraining of large models for decision-making or single-task adaptation. But real-wor… ▽ More The full potential of large pretrained models remains largely untapped in control domains like robotics. This is mainly because of the scarcity of data and the computational challenges associated with training or fine-tuning these large models for such applications. Prior work mainly emphasizes either effective pretraining of large models for decision-making or single-task adaptation. But real-world problems will require data-efficient, continual adaptation for new control tasks. Recognizing these constraints, we introduce TAIL (Task-specific Adapters for Imitation Learning), a framework for efficient adaptation to new control tasks. Inspired by recent advancements in parameter-efficient fine-tuning in language domains, we explore efficient fine-tuning techniques -- e.g., Bottleneck Adapters, P-Tuning, and Low-Rank Adaptation (LoRA) -- in TAIL to adapt large pretrained models for new tasks with limited demonstration data. Our extensive experiments in large-scale language-conditioned manipulation tasks comparing prevalent parameter-efficient fine-tuning techniques and adaptation baselines suggest that TAIL with LoRA can achieve the best post-adaptation performance with only 1\% of the trainable parameters of full fine-tuning, while avoiding catastrophic forgetting and preserving adaptation plasticity in continual learning settings. △ Less

Submitted 8 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: Published on ICLR 2024

arXiv:2310.05400 [pdf, other]

Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers

Authors: Shiyue Cao, Yueqin Yin, Lianghua Huang, Yu Liu, Xin Zhao, Deli Zhao, Kaiqi Huang

Abstract: Vector-quantized image modeling has shown great potential in synthesizing high-quality images. However, generating high-resolution images remains a challenging task due to the quadratic computational overhead of the self-attention process. In this study, we seek to explore a more efficient two-stage framework for high-resolution image generation with improvements in the following three aspects. (1… ▽ More Vector-quantized image modeling has shown great potential in synthesizing high-quality images. However, generating high-resolution images remains a challenging task due to the quadratic computational overhead of the self-attention process. In this study, we seek to explore a more efficient two-stage framework for high-resolution image generation with improvements in the following three aspects. (1) Based on the observation that the first quantization stage has solid local property, we employ a local attention-based quantization model instead of the global attention mechanism used in previous methods, leading to better efficiency and reconstruction quality. (2) We emphasize the importance of multi-grained feature interaction during image generation and introduce an efficient attention mechanism that combines global attention (long-range semantic consistency within the whole image) and local attention (fined-grained details). This approach results in faster generation speed, higher generation fidelity, and improved resolution. (3) We propose a new generation pipeline incorporating autoencoding training and autoregressive generation strategy, demonstrating a better paradigm for image synthesis. Extensive experiments demonstrate the superiority of our approach in high-quality and high-resolution image reconstruction and generation. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: This paper is accepted to ICCV2023

arXiv:2310.05245 [pdf, other]

Influence of Camera-LiDAR Configuration on 3D Object Detection for Autonomous Driving

Authors: Ye Li, Hanjiang Hu, Zuxin Liu, Xiaohao Xu, Xiaonan Huang, Ding Zhao

Abstract: Cameras and LiDARs are both important sensors for autonomous driving, playing critical roles in 3D object detection. Camera-LiDAR Fusion has been a prevalent solution for robust and accurate driving perception. In contrast to the vast majority of existing arts that focus on how to improve the performance of 3D target detection through cross-modal schemes, deep learning algorithms, and training tri… ▽ More Cameras and LiDARs are both important sensors for autonomous driving, playing critical roles in 3D object detection. Camera-LiDAR Fusion has been a prevalent solution for robust and accurate driving perception. In contrast to the vast majority of existing arts that focus on how to improve the performance of 3D target detection through cross-modal schemes, deep learning algorithms, and training tricks, we devote attention to the impact of sensor configurations on the performance of learning-based methods. To achieve this, we propose a unified information-theoretic surrogate metric for camera and LiDAR evaluation based on the proposed sensor perception model. We also design an accelerated high-quality framework for data acquisition, model training, and performance evaluation that functions with the CARLA simulator. To show the correlation between detection performance and our surrogate metrics, We conduct experiments using several camera-LiDAR placements and parameters inspired by self-driving companies and research institutions. Extensive experimental results of representative algorithms on nuScenes dataset validate the effectiveness of our surrogate metric, demonstrating that sensor configurations significantly impact point-cloud-image fusion based detection models, which contribute up to 30% discrepancy in terms of the average precision. △ Less

Submitted 2 March, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

arXiv:2310.05149 [pdf, other]

Retrieval-Generation Synergy Augmented Large Language Models

Authors: Zhangyin Feng, Xiaocheng Feng, Dezhi Zhao, Mao** Yang, Bing Qin

Abstract: Large language models augmented with task-relevant documents have demonstrated impressive performance on knowledge-intensive tasks. However, regarding how to obtain effective documents, the existing methods are mainly divided into two categories. One is to retrieve from an external knowledge base, and the other is to utilize large language models to generate documents. We propose an iterative retr… ▽ More Large language models augmented with task-relevant documents have demonstrated impressive performance on knowledge-intensive tasks. However, regarding how to obtain effective documents, the existing methods are mainly divided into two categories. One is to retrieve from an external knowledge base, and the other is to utilize large language models to generate documents. We propose an iterative retrieval-generation collaborative framework. It is not only able to leverage both parametric and non-parametric knowledge, but also helps to find the correct reasoning path through retrieval-generation interactions, which is very important for tasks that require multi-step reasoning. We conduct experiments on four question answering datasets, including single-hop QA and multi-hop QA tasks. Empirical results show that our method significantly improves the reasoning ability of large language models and outperforms previous baselines. △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2310.04979 [pdf, other]

Initial Task Assignment in Multi-Human Multi-Robot Teams: An Attention-enhanced Hierarchical Reinforcement Learning Approach

Authors: Ruiqi Wang, Dezhong Zhao, Arjun Gupte, Byung-Cheol Min

Abstract: Multi-human multi-robot teams (MH-MR) obtain tremendous potential in tackling intricate and massive missions by merging distinct strengths and expertise of individual members. The inherent heterogeneity of these teams necessitates advanced initial task assignment (ITA) methods that align tasks with the intrinsic capabilities of team members from the outset. While existing reinforcement learning ap… ▽ More Multi-human multi-robot teams (MH-MR) obtain tremendous potential in tackling intricate and massive missions by merging distinct strengths and expertise of individual members. The inherent heterogeneity of these teams necessitates advanced initial task assignment (ITA) methods that align tasks with the intrinsic capabilities of team members from the outset. While existing reinforcement learning approaches show encouraging results, they might fall short in addressing the nuances of long-horizon ITA problems, particularly in settings with large-scale MH-MR teams or multifaceted tasks. To bridge this gap, we propose an attention-enhanced hierarchical reinforcement learning approach that decomposes the complex ITA problem into structured sub-problems, facilitating more efficient allocations. To bolster sub-policy learning, we introduce a hierarchical cross-attribute attention (HCA) mechanism, encouraging each sub-policy within the hierarchy to discern and leverage the specific nuances in the state space that are crucial for its respective decision-making phase. Through an extensive environmental surveillance case study, we demonstrate the benefits of our model and the HCA inside. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2310.04828 [pdf, other]

Guardians as You Fall: Active Mode Transition for Safe Falling

Authors: Yikai Wang, Mengdi Xu, Guanya Shi, Ding Zhao

Abstract: Recent advancements in optimal control and reinforcement learning have enabled quadrupedal robots to perform various agile locomotion tasks over diverse terrains. During these agile motions, ensuring the stability and resiliency of the robot is a primary concern to prevent catastrophic falls and mitigate potential damages. Previous methods primarily focus on recovery policies after the robot falls… ▽ More Recent advancements in optimal control and reinforcement learning have enabled quadrupedal robots to perform various agile locomotion tasks over diverse terrains. During these agile motions, ensuring the stability and resiliency of the robot is a primary concern to prevent catastrophic falls and mitigate potential damages. Previous methods primarily focus on recovery policies after the robot falls. There is no active safe falling solution to the best of our knowledge. In this paper, we proposed Guardians as You Fall (GYF), a safe falling/tumbling and recovery framework that can actively tumble and recover to stable modes to reduce damage in highly dynamic scenarios. The key idea of GYF is to adaptively traverse different stable modes via active tumbling before the robot shifts to irrecoverable poses. Via comprehensive simulation and real-world experiments, we show that GYF significantly reduces the maximum acceleration and jerk of the robot base compared to the baselines. In particular, GYF reduces the maximum acceleration and jerk by 20%~73% in different scenarios in simulation and real-world experiments. GYF offers a new perspective on safe falling and recovery in locomotion tasks, potentially enabling much more aggressive explorations of existing agile locomotion skills. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: website: https://sites.google.com/view/guardians-as-you-fall/

arXiv:2310.03718 [pdf, other]

Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning

Authors: Yihang Yao, Zuxin Liu, Zhepeng Cen, Jiacheng Zhu, Wenhao Yu, Tingnan Zhang, Ding Zhao

Abstract: Safe reinforcement learning (RL) focuses on training reward-maximizing agents subject to pre-defined safety constraints. Yet, learning versatile safe policies that can adapt to varying safety constraint requirements during deployment without retraining remains a largely unexplored and challenging area. In this work, we formulate the versatile safe RL problem and consider two primary requirements:… ▽ More Safe reinforcement learning (RL) focuses on training reward-maximizing agents subject to pre-defined safety constraints. Yet, learning versatile safe policies that can adapt to varying safety constraint requirements during deployment without retraining remains a largely unexplored and challenging area. In this work, we formulate the versatile safe RL problem and consider two primary requirements: training efficiency and zero-shot adaptation capability. To address them, we introduce the Conditioned Constrained Policy Optimization (CCPO) framework, consisting of two key modules: (1) Versatile Value Estimation (VVE) for approximating value functions under unseen threshold conditions, and (2) Conditioned Variational Inference (CVI) for encoding arbitrary constraint thresholds during policy optimization. Our extensive experiments demonstrate that CCPO outperforms the baselines in terms of safety and task performance while preserving zero-shot adaptation capabilities to different constraint thresholds data-efficiently. This makes our approach suitable for real-world dynamic applications. △ Less

Submitted 29 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

arXiv:2310.03003 [pdf, other]

From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference

Authors: Siddharth Samsi, Dan Zhao, Joseph McDonald, Baolin Li, Adam Michaleas, Michael Jones, William Bergeron, Jeremy Kepner, Devesh Tiwari, Vijay Gadepally

Abstract: Large language models (LLMs) have exploded in popularity due to their new generative capabilities that go far beyond prior state-of-the-art. These technologies are increasingly being leveraged in various domains such as law, finance, and medicine. However, these models carry significant computational challenges, especially the compute and energy costs required for inference. Inference energy costs… ▽ More Large language models (LLMs) have exploded in popularity due to their new generative capabilities that go far beyond prior state-of-the-art. These technologies are increasingly being leveraged in various domains such as law, finance, and medicine. However, these models carry significant computational challenges, especially the compute and energy costs required for inference. Inference energy costs already receive less attention than the energy costs of training LLMs -- despite how often these large models are called on to conduct inference in reality (e.g., ChatGPT). As these state-of-the-art LLMs see increasing usage and deployment in various domains, a better understanding of their resource utilization is crucial for cost-savings, scaling performance, efficient hardware usage, and optimal inference strategies. In this paper, we describe experiments conducted to study the computational and energy utilization of inference with LLMs. We benchmark and conduct a preliminary analysis of the inference performance and inference energy costs of different sizes of LLaMA -- a recent state-of-the-art LLM -- developed by Meta AI on two generations of popular GPUs (NVIDIA V100 \& A100) and two datasets (Alpaca and GSM8K) to reflect the diverse set of tasks/benchmarks for LLMs in research and practice. We present the results of multi-node, multi-GPU inference using model sharding across up to 32 GPUs. To our knowledge, our work is the one of the first to study LLM inference performance from the perspective of computational and energy resources at this scale. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2310.00871 [pdf, other]

COMPOSER: Scalable and Robust Modular Policies for Snake Robots

Authors: Yuyou Zhang, Yaru Niu, Xingyu Liu, Ding Zhao

Abstract: Snake robots have showcased remarkable compliance and adaptability in their interaction with environments, mirroring the traits of their natural counterparts. While their hyper-redundant and high-dimensional characteristics add to this adaptability, they also pose great challenges to robot control. Instead of perceiving the hyper-redundancy and flexibility of snake robots as mere challenges, there… ▽ More Snake robots have showcased remarkable compliance and adaptability in their interaction with environments, mirroring the traits of their natural counterparts. While their hyper-redundant and high-dimensional characteristics add to this adaptability, they also pose great challenges to robot control. Instead of perceiving the hyper-redundancy and flexibility of snake robots as mere challenges, there lies an unexplored potential in leveraging these traits to enhance robustness and generalizability at the control policy level. We seek to develop a control policy that effectively breaks down the high dimensionality of snake robots while harnessing their redundancy. In this work, we consider the snake robot as a modular robot and formulate the control of the snake robot as a cooperative Multi-Agent Reinforcement Learning (MARL) problem. Each segment of the snake robot functions as an individual agent. Specifically, we incorporate a self-attention mechanism to enhance the cooperative behavior between agents. A high-level imagination policy is proposed to provide additional rewards to guide the low-level control policy. We validate the proposed method COMPOSER with five snake robot tasks, including goal reaching, wall climbing, shape formation, tube crossing, and block pushing. COMPOSER achieves the highest success rate across all tasks when compared to a centralized baseline and four modular policy baselines. Additionally, we show enhanced robustness against module corruption and significantly superior zero-shot generalizability in our proposed method. The videos of this work are available on our project page: https://sites.google.com/view/composer-snake/. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: 7 pages, 5 figures

arXiv:2310.00814 [pdf, other]

Quark masses and low energy constants in the continuum from the tadpole improved clover ensembles

Authors: Zhi-Cheng Hu, Bo-Lun Hu, Ji-Hao Wang, Ming Gong, Liuming Liu, Peng Sun, Wei Sun, Wei Wang, Yi-Bo Yang, Dian-Jun Zhao

Abstract: We present the light-flavor quark masses and low energy constants using the 2+1 flavor full-QCD ensembles with stout smeared clover fermion action and Symanzik gauge actions. Both the fermion and gauge actions are tadpole improved self-consistently. The simulations are performed on 11 ensembles at 3 lattice spacings $a\in[0.05,0.11]$ fm, 4 spatial sizes $L\in[2.5, 5.1]$ fm, 7 pion masses… ▽ More We present the light-flavor quark masses and low energy constants using the 2+1 flavor full-QCD ensembles with stout smeared clover fermion action and Symanzik gauge actions. Both the fermion and gauge actions are tadpole improved self-consistently. The simulations are performed on 11 ensembles at 3 lattice spacings $a\in[0.05,0.11]$ fm, 4 spatial sizes $L\in[2.5, 5.1]$ fm, 7 pion masses $m_π\in[135,350]$ MeV, and several values of the strange quark mass. The quark mass is defined through the partially conserved axial current (PCAC) relation and renormalized to $\overline{\mathrm{MS}}$ 2 GeV through the intermediate regularization independent momentum subtraction (RI/MOM) scheme. The systematic uncertainty of using the symmetric momentum subtraction (SMOM) scheme is also included. Eventually, we predict $m_u=2.45(22)(20)$ MeV, $m_d=4.74(11)(09)$ MeV, and $m_s=98.8(2.9)(4.7)$ MeV with the systematic uncertainties from lattice spacing determination, continuum extrapolation and renormalization constant included. We also obtain the chiral condensate $Σ^{1/3}=268.6(3.6)(0.7)$ MeV and the pion decay constant $F=86.6(7)(1.4) $ MeV in the $N_f=2$ chiral limit, and the next-to-leading order low energy constants $\ell_3=2.43(54)(05)$ and $\ell_4=4.322(75)(96)$. △ Less

Submitted 7 January, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: Version accepted by PRD. 7 pages, 4 figures, with more details in the appendix

arXiv:2310.00178 [pdf, other]

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm

Authors: Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara Sainath, Pedro Moreno Mengibar

Abstract: Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios. We propose algorithms for contextual biasing based on the Knuth-Morris-Pratt algorithm for pattern matching. During beam search, we boost the score of a token extension if it extends matching into a set of biasing… ▽ More Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios. We propose algorithms for contextual biasing based on the Knuth-Morris-Pratt algorithm for pattern matching. During beam search, we boost the score of a token extension if it extends matching into a set of biasing phrases. Our method simulates the classical approaches often implemented in the weighted finite state transducer (WFST) framework, but avoids the FST language altogether, with careful considerations on memory footprint and efficiency on tensor processing units (TPUs) by vectorization. Without introducing additional model parameters, our method achieves significant word error rate (WER) reductions on biasing test sets by itself, and yields further performance gain when combined with a model-based biasing method. △ Less

Submitted 29 September, 2023; originally announced October 2023.

arXiv:2310.00109 [pdf, other]

FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things

Authors: Samiul Alam, Tuo Zhang, Tiantian Feng, Hui Shen, Zhichao Cao, Dong Zhao, JeongGil Ko, Kiran Somasundaram, Shrikanth S. Narayanan, Salman Avestimehr, Mi Zhang

Abstract: There is a significant relevance of federated learning (FL) in the realm of Artificial Intelligence of Things (AIoT). However, most existing FL works do not use datasets collected from authentic IoT devices and thus do not capture unique modalities and inherent challenges of IoT data. To fill this critical gap, in this work, we introduce FedAIoT, an FL benchmark for AIoT. FedAIoT includes eight da… ▽ More There is a significant relevance of federated learning (FL) in the realm of Artificial Intelligence of Things (AIoT). However, most existing FL works do not use datasets collected from authentic IoT devices and thus do not capture unique modalities and inherent challenges of IoT data. To fill this critical gap, in this work, we introduce FedAIoT, an FL benchmark for AIoT. FedAIoT includes eight datasets collected from a wide range of IoT devices. These datasets cover unique IoT modalities and target representative applications of AIoT. FedAIoT also includes a unified end-to-end FL framework for AIoT that simplifies benchmarking the performance of the datasets. Our benchmark results shed light on the opportunities and challenges of FL for AIoT. We hope FedAIoT could serve as an invaluable resource to foster advancements in the important field of FL for AIoT. The repository of FedAIoT is maintained at https://github.com/AIoT-MLSys-Lab/FedAIoT. △ Less

Submitted 19 June, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

arXiv:2309.17203 [pdf, other]

Balancing Both Behavioral Quality and Diversity in Unsupervised Skill Discovery

Authors: Xin Liu, Yaran Chen, Dongbin Zhao

Abstract: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Unsupervised skill discovery seeks to dig out diverse and exploratory skills without extrinsic reward, with the discovered skills efficiently adapting to multiple downstream tasks in various ways. However, recent advanced methods stru… ▽ More This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Unsupervised skill discovery seeks to dig out diverse and exploratory skills without extrinsic reward, with the discovered skills efficiently adapting to multiple downstream tasks in various ways. However, recent advanced methods struggle to well balance behavioral exploration and diversity, particularly when the agent dynamics are complex and potential skills are hard to discern (e.g., robot behavior discovery). In this paper, we propose \textbf{Co}ntrastive \textbf{m}ulti-objective \textbf{S}kill \textbf{D}iscovery \textbf{(ComSD)} which discovers exploratory and diverse behaviors through a novel intrinsic incentive, named contrastive multi-objective reward. It contains a novel diversity reward based on contrastive learning to effectively drive agents to discern existing skills, and a particle-based exploration reward to access and learn new behaviors. Moreover, a novel dynamic weighting mechanism between the above two rewards is proposed for diversity-exploration balance, which further improves behavioral quality. Extensive experiments and analysis demonstrate that ComSD can generate diverse behaviors at different exploratory levels for complex multi-joint robots, enabling state-of-the-art performance across 32 challenging downstream adaptation tasks, which recent advanced methods cannot. Codes will be opened after publication. △ Less

Submitted 19 May, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2309.17061 [pdf, other]

SCALE: Synergized Collaboration of Asymmetric Language Translation Engines

Authors: Xin Cheng, Xun Wang, Tao Ge, Si-Qing Chen, Furu Wei, Dongyan Zhao, Rui Yan

Abstract: In this paper, we introduce SCALE, a collaborative framework that connects compact Specialized Translation Models (STMs) and general-purpose Large Language Models (LLMs) as one unified translation engine. By introducing translation from STM into the triplet in-context demonstrations, SCALE unlocks refinement and pivoting ability of LLM, thus mitigating language bias of LLM and parallel data bias o… ▽ More In this paper, we introduce SCALE, a collaborative framework that connects compact Specialized Translation Models (STMs) and general-purpose Large Language Models (LLMs) as one unified translation engine. By introducing translation from STM into the triplet in-context demonstrations, SCALE unlocks refinement and pivoting ability of LLM, thus mitigating language bias of LLM and parallel data bias of STM, enhancing LLM speciality without sacrificing generality, and facilitating continual learning without expensive LLM fine-tuning. Our comprehensive experiments show that SCALE significantly outperforms both few-shot LLMs (GPT-4) and specialized models (NLLB) in challenging low-resource settings. Moreover, in Xhosa to English translation, SCALE experiences consistent improvement by a 4 BLEURT score without tuning LLM and surpasses few-shot GPT-4 by 2.5 COMET score and 3.8 BLEURT score when equipped with a compact model consisting of merely 600M parameters. SCALE could also effectively exploit the existing language bias of LLMs by using an English-centric STM as a pivot for translation between any language pairs, outperforming few-shot GPT-4 by an average of 6 COMET points across eight translation directions. Furthermore we provide an in-depth analysis of SCALE's robustness, translation characteristics, and latency costs, providing solid foundation for future studies exploring the potential synergy between LLMs and more specialized, task-specific models. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.16272 [pdf]

Cell nucleus elastography with the adjoint-based inverse solver

Authors: Yue Mei, Xuan Feng, Yun **, Rongyao Kang, Xinyu Wang, Dongmei Zhao, Soham Ghosh, Corey P Neu, Stephane Avril

Abstract: Background and Objectives: The mechanics of the nucleus depends on cellular structures and architecture, and impact a number of diseases. Nuclear mechanics is yet rather complex due to heterogeneous distribution of dense heterochromatin and loose euchromatin domains, giving rise to spatially variable stiffness properties. Methods: In this study, we propose to use the adjoint-based inverse solver t… ▽ More Background and Objectives: The mechanics of the nucleus depends on cellular structures and architecture, and impact a number of diseases. Nuclear mechanics is yet rather complex due to heterogeneous distribution of dense heterochromatin and loose euchromatin domains, giving rise to spatially variable stiffness properties. Methods: In this study, we propose to use the adjoint-based inverse solver to identify for the first time the nonhomogeneous elastic property distribution of the nucleus. Inputs of the inverse solver are deformation fields measured with microscopic imaging in contracting cardiomyocytes. Results: The feasibility of the proposed method is first demonstrated using simulated data. Results indicate accurate identification of the assumed heterochromatin region, with a maximum relative error of less than 5%. We also investigate the influence of unknown Poisson's ratio on the reconstruction and find that variations of the Poisson's ratio in the range [0.3-0.5] result in uncertainties of less than 15% in the identified stiffness. Finally, we apply the inverse solver on actual deformation fields acquired within the nuclei of two cardiomyocytes. The obtained results are in good agreement with the density maps obtained from microscopy images. Conclusions: Overall, the proposed approach shows great potential for nuclear elastography, with promising value for emerging fields of mechanobiology and mechanogenetics. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Journal ref: Computer Methods and Programs in Biomedicine, In press

arXiv:2309.16158 [pdf, other]

FireFly v2: Advancing Hardware Support for High-Performance Spiking Neural Network with a Spatiotemporal FPGA Accelerator

Authors: **dong Li, Guobin Shen, Dongcheng Zhao, Qian Zhang, Yi Zeng

Abstract: Spiking Neural Networks (SNNs) are expected to be a promising alternative to Artificial Neural Networks (ANNs) due to their strong biological interpretability and high energy efficiency. Specialized SNN hardware offers clear advantages over general-purpose devices in terms of power and performance. However, there's still room to advance hardware support for state-of-the-art (SOTA) SNN algorithms a… ▽ More Spiking Neural Networks (SNNs) are expected to be a promising alternative to Artificial Neural Networks (ANNs) due to their strong biological interpretability and high energy efficiency. Specialized SNN hardware offers clear advantages over general-purpose devices in terms of power and performance. However, there's still room to advance hardware support for state-of-the-art (SOTA) SNN algorithms and improve computation and memory efficiency. As a further step in supporting high-performance SNNs on specialized hardware, we introduce FireFly v2, an FPGA SNN accelerator that can address the issue of non-spike operation in current SOTA SNN algorithms, which presents an obstacle in the end-to-end deployment onto existing SNN hardware. To more effectively align with the SNN characteristics, we design a spatiotemporal dataflow that allows four dimensions of parallelism and eliminates the need for membrane potential storage, enabling on-the-fly spike processing and spike generation. To further improve hardware acceleration performance, we develop a high-performance spike computing engine as a backend based on a systolic array operating at 500-600MHz. To the best of our knowledge, FireFly v2 achieves the highest clock frequency among all FPGA-based implementations. Furthermore, it stands as the first SNN accelerator capable of supporting non-spike operations, which are commonly used in advanced SNN algorithms. FireFly v2 has doubled the throughput and DSP efficiency when compared to our previous version of FireFly and it exhibits 1.33 times the DSP efficiency and 1.42 times the power efficiency compared to the current most advanced FPGA accelerators. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.15606 [pdf, other]

From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven AI Chaining

Authors: Xiaoxue Ren, Xinyuan Ye, Dehai Zhao, Zhenchang Xing, Xiaohu Yang

Abstract: Large Language Models (LLMs) have shown promising results in automatic code generation by improving coding efficiency to a certain extent. However, generating high-quality and reliable code remains a formidable task because of LLMs' lack of good programming practice, especially in exception handling. In this paper, we first conduct an empirical study and summarise three crucial challenges of LLMs… ▽ More Large Language Models (LLMs) have shown promising results in automatic code generation by improving coding efficiency to a certain extent. However, generating high-quality and reliable code remains a formidable task because of LLMs' lack of good programming practice, especially in exception handling. In this paper, we first conduct an empirical study and summarise three crucial challenges of LLMs in exception handling, i.e., incomplete exception handling, incorrect exception handling and abuse of try-catch. We then try prompts with different granularities to address such challenges, finding fine-grained knowledge-driven prompts works best. Based on our empirical study, we propose a novel Knowledge-driven Prompt Chaining-based code generation approach, name KPC, which decomposes code generation into an AI chain with iterative check-rewrite steps and chains fine-grained knowledge-driven prompts to assist LLMs in considering exception-handling specifications. We evaluate our KPC-based approach with 3,079 code generation tasks extracted from the Java official API documentation. Extensive experimental results demonstrate that the KPC-based approach has considerable potential to ameliorate the quality of code generated by LLMs. It achieves this through proficiently managing exceptions and obtaining remarkable enhancements of 109.86% and 578.57% with static evaluation methods, as well as a reduction of 18 runtime bugs in the sampled dataset with dynamic validation. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: Accepted by 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023)

arXiv:2309.15516 [pdf, other]

Teaching Text-to-Image Models to Communicate in Dialog

Authors: Xiaowen Sun, Jiazhan Feng, Yuxuan Wang, Yuxuan Lai, Xingyu Shen, Dongyan Zhao

Abstract: A picture is worth a thousand words, thus, it is crucial for conversational agents to understand, perceive, and effectively respond with pictures. However, we find that directly employing conventional image generation techniques is inadequate for conversational agents to produce image responses effectively. In this paper, we focus on the innovative dialog-to-image generation task, where the model… ▽ More A picture is worth a thousand words, thus, it is crucial for conversational agents to understand, perceive, and effectively respond with pictures. However, we find that directly employing conventional image generation techniques is inadequate for conversational agents to produce image responses effectively. In this paper, we focus on the innovative dialog-to-image generation task, where the model synthesizes a high-resolution image aligned with the given dialog context as a response. To tackle this problem, we design a tailored fine-tuning approach on the top of state-of-the-art text-to-image generation models to fully exploit the structural and semantic features in dialog context during image generation. Concretely, we linearize the dialog context with specific indicators to maintain the dialog structure, and employ in-domain data to alleviate the style mismatch between dialog-to-image and conventional image generation tasks. Empirical results on PhotoChat and MMDialog Corpus show that our approach brings consistent and remarkable improvement with 3 state-of-the-art pre-trained text-to-image generation backbones. △ Less

Submitted 7 February, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: Work in progress

arXiv:2309.15413 [pdf, other]

doi 10.1109/TPAMI.2023.3273574

Inherit with Distillation and Evolve with Contrast: Exploring Class Incremental Semantic Segmentation Without Exemplar Memory

Authors: Danpei Zhao, Bo Yuan, Zhenwei Shi

Abstract: As a front-burner problem in incremental learning, class incremental semantic segmentation (CISS) is plagued by catastrophic forgetting and semantic drift. Although recent methods have utilized knowledge distillation to transfer knowledge from the old model, they are still unable to avoid pixel confusion, which results in severe misclassification after incremental steps due to the lack of annotati… ▽ More As a front-burner problem in incremental learning, class incremental semantic segmentation (CISS) is plagued by catastrophic forgetting and semantic drift. Although recent methods have utilized knowledge distillation to transfer knowledge from the old model, they are still unable to avoid pixel confusion, which results in severe misclassification after incremental steps due to the lack of annotations for past and future classes. Meanwhile data-replay-based approaches suffer from storage burdens and privacy concerns. In this paper, we propose to address CISS without exemplar memory and resolve catastrophic forgetting as well as semantic drift synchronously. We present Inherit with Distillation and Evolve with Contrast (IDEC), which consists of a Dense Knowledge Distillation on all Aspects (DADA) manner and an Asymmetric Region-wise Contrastive Learning (ARCL) module. Driven by the devised dynamic class-specific pseudo-labelling strategy, DADA distils intermediate-layer features and output-logits collaboratively with more emphasis on semantic-invariant knowledge inheritance. ARCL implements region-wise contrastive learning in the latent space to resolve semantic drift among known classes, current classes, and unknown classes. We demonstrate the effectiveness of our method on multiple CISS tasks by state-of-the-art performance, including Pascal VOC 2012, ADE20K and ISPRS datasets. Our method also shows superior anti-forgetting ability, particularly in multi-step CISS tasks. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Journal ref: IEEE TPAMI 2023

arXiv:2309.13956 [pdf, other]

In-Domain GAN Inversion for Faithful Reconstruction and Editability

Authors: Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen, Bolei Zhou

Abstract: Generative Adversarial Networks (GANs) have significantly advanced image synthesis through map** randomly sampled latent codes to high-fidelity synthesized images. However, applying well-trained GANs to real image editing remains challenging. A common solution is to find an approximate latent code that can adequately recover the input image to edit, which is also known as GAN inversion. To inver… ▽ More Generative Adversarial Networks (GANs) have significantly advanced image synthesis through map** randomly sampled latent codes to high-fidelity synthesized images. However, applying well-trained GANs to real image editing remains challenging. A common solution is to find an approximate latent code that can adequately recover the input image to edit, which is also known as GAN inversion. To invert a GAN model, prior works typically focus on reconstructing the target image at the pixel level, yet few studies are conducted on whether the inverted result can well support manipulation at the semantic level. This work fills in this gap by proposing in-domain GAN inversion, which consists of a domain-guided encoder and a domain-regularized optimizer, to regularize the inverted code in the native latent space of the pre-trained GAN model. In this way, we manage to sufficiently reuse the knowledge learned by GANs for image reconstruction, facilitating a wide range of editing applications without any retraining. We further make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property. Such a trade-off sheds light on how a GAN model represents an image with various semantics encoded in the learned latent distribution. Code, models, and demo are available at the project page: https://genforce.github.io/idinvert/. △ Less

Submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.13150 [pdf, other]

Pixel-wise Smoothing for Certified Robustness against Camera Motion Perturbations

Authors: Hanjiang Hu, Zuxin Liu, Linyi Li, Jiacheng Zhu, Ding Zhao

Abstract: Deep learning-based visual perception models lack robustness when faced with camera motion perturbations in practice. The current certification process for assessing robustness is costly and time-consuming due to the extensive number of image projections required for Monte Carlo sampling in the 3D camera motion space. To address these challenges, we present a novel, efficient, and practical framew… ▽ More Deep learning-based visual perception models lack robustness when faced with camera motion perturbations in practice. The current certification process for assessing robustness is costly and time-consuming due to the extensive number of image projections required for Monte Carlo sampling in the 3D camera motion space. To address these challenges, we present a novel, efficient, and practical framework for certifying the robustness of 3D-2D projective transformations against camera motion perturbations. Our approach leverages a smoothing distribution over the 2D pixel space instead of in the 3D physical space, eliminating the need for costly camera motion sampling and significantly enhancing the efficiency of robustness certifications. With the pixel-wise smoothed classifier, we are able to fully upper bound the projection errors using a technique of uniform partitioning in camera motion space. Additionally, we extend our certification framework to a more general scenario where only a single-frame point cloud is required in the projection oracle. Through extensive experimentation, we validate the trade-off between effectiveness and efficiency enabled by our proposed method. Remarkably, our approach achieves approximately 80% certified accuracy while utilizing only 30% of the projected image frames. The code is available at https://github.com/HanjiangHu/pixel-wise-smoothing. △ Less

Submitted 2 March, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: Camera-ready version of AISTATS 2024, 30 pages, 5 figures, 13 tables

arXiv:2309.07911 [pdf, other]

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning

Authors: Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yingya Zhang, Changxin Gao, Deli Zhao, Nong Sang

Abstract: Recently, large-scale pre-trained language-image models like CLIP have shown extraordinary capabilities for understanding spatial contents, but naively transferring such models to video recognition still suffers from unsatisfactory temporal modeling capabilities. Existing methods insert tunable structures into or in parallel with the pre-trained model, which either requires back-propagation throug… ▽ More Recently, large-scale pre-trained language-image models like CLIP have shown extraordinary capabilities for understanding spatial contents, but naively transferring such models to video recognition still suffers from unsatisfactory temporal modeling capabilities. Existing methods insert tunable structures into or in parallel with the pre-trained model, which either requires back-propagation through the whole pre-trained model and is thus resource-demanding, or is limited by the temporal reasoning capability of the pre-trained structure. In this work, we present DiST, which disentangles the learning of spatial and temporal aspects of videos. Specifically, DiST uses a dual-encoder structure, where a pre-trained foundation model acts as the spatial encoder, and a lightweight network is introduced as the temporal encoder. An integration branch is inserted between the encoders to fuse spatio-temporal information. The disentangled spatial and temporal learning in DiST is highly efficient because it avoids the back-propagation of massive pre-trained parameters. Meanwhile, we empirically show that disentangled learning with an extra network for integration benefits both spatial and temporal understanding. Extensive experiments on five benchmarks show that DiST delivers better performance than existing state-of-the-art methods by convincing gaps. When pre-training on the large-scale Kinetics-710, we achieve 89.7% on Kinetics-400 with a frozen ViT-L model, which verifies the scalability of DiST. Codes and models can be found in https://github.com/alibaba-mmai-research/DiST. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: ICCV2023. Code: https://github.com/alibaba-mmai-research/DiST

arXiv:2309.07284 [pdf]

Toward Lossless Homomorphic Encryption for Scientific Computation

Authors: Muhammad Jahanzeb Khan, Bo Fang, Dongfang Zhao

Abstract: This paper presents a comprehensive investigation into encrypted computations using the CKKS (Cheon-Kim-Kim-Song) scheme, with a focus on multi-dimensional vector operations and real-world applications. Through two meticulously designed experiments, the study explores the potential of the CKKS scheme in Super Computing and its implications for data privacy and computational efficiency. The first e… ▽ More This paper presents a comprehensive investigation into encrypted computations using the CKKS (Cheon-Kim-Kim-Song) scheme, with a focus on multi-dimensional vector operations and real-world applications. Through two meticulously designed experiments, the study explores the potential of the CKKS scheme in Super Computing and its implications for data privacy and computational efficiency. The first experiment reveals the promising applicability of CKKS to matrix multiplication, indicating marginal differences in Euclidean distance and near-to-zero mean square error across various matrix sizes. The second experiment, applied to a wildfire dataset, illustrates the feasibility of using encrypted machine learning models without significant loss in accuracy. The insights gleaned from the research set a robust foundation for future innovations, including the potential for GPU acceleration in CKKS computations within TenSEAL. Challenges such as noise budget computation, accuracy loss in multiplication, and the distinct characteristics of arithmetic operations in the context of CKKS are also discussed. The paper serves as a vital step towards understanding the complexities and potentials of encrypted computations, with broad implications for secure data processing and privacy preservation in various scientific domains. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2309.04198 [pdf, other]

Don't Ignore Dual Logic Ability of LLMs while Privatizing: A Data-Intensive Analysis in Medical Domain

Authors: Yanrui Du, Sendong Zhao, Muzhen Cai, Ming Ma, Danyang Zhao, Jiawei Cao, Bing Qin

Abstract: Extensive studies have been devoted to privatizing general-domain Large Language Models (LLMs) as Domain-Specific LLMs via feeding specific-domain data. However, these privatization efforts often ignored a critical aspect: Dual Logic Ability, which is a core reasoning ability for LLMs. The dual logic ability of LLMs ensures that they can maintain a consistent stance when confronted with both posit… ▽ More Extensive studies have been devoted to privatizing general-domain Large Language Models (LLMs) as Domain-Specific LLMs via feeding specific-domain data. However, these privatization efforts often ignored a critical aspect: Dual Logic Ability, which is a core reasoning ability for LLMs. The dual logic ability of LLMs ensures that they can maintain a consistent stance when confronted with both positive and negative statements about the same fact. Our study focuses on how the dual logic ability of LLMs is affected during the privatization process in the medical domain. We conduct several experiments to analyze the dual logic ability of LLMs by examining the consistency of the stance in responses to paired questions about the same fact. In our experiments, interestingly, we observed a significant decrease in the dual logic ability of existing LLMs after privatization. Besides, our results indicate that incorporating general domain dual logic data into LLMs not only enhances LLMs' dual logic ability but also further improves their accuracy. These findings underscore the importance of prioritizing LLMs' dual logic ability during the privatization process. Our study establishes a benchmark for future research aimed at exploring LLMs' dual logic ability during the privatization process and offers valuable guidance for privatization efforts in real-world applications. △ Less

Submitted 23 February, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

arXiv:2309.04107 [pdf]

Investigation of temperature stress tolerance in Arabidopsis STTM165/166 using electrophysiology and RNA-Seq

Authors: Dongjie Zhao, Qinghui Chen, Ziyang Wang, Lucy Arbanas, Guiliang Tang

Abstract: Plant electrical signals have been shown to be generated in response to various environmental stresses, but the relationship between these signals and stress tolerance is not well understood. In this study, we used the Arabidopsis STTM165/166 mutant, which exhibits enhanced temperature tolerance, to examine this relationship. Surface recording techniques were utilized to compare the generation rat… ▽ More Plant electrical signals have been shown to be generated in response to various environmental stresses, but the relationship between these signals and stress tolerance is not well understood. In this study, we used the Arabidopsis STTM165/166 mutant, which exhibits enhanced temperature tolerance, to examine this relationship. Surface recording techniques were utilized to compare the generation ratio and duration characteristics of electrical signals in the STTM165/166 mutant and wild type (WT). Patch-clamp recording was employed to assess ion channel currents, specifically those of calcium ions. The current intensity of the mutant was found to be lower than that of the WT. As calcium ions are involved in the generation of plant electrical signals, we hypothesized that the reduced calcium channel activity in the mutant increased its electrical signal threshold. RNA-Seq analysis revealed differential expression of AHA genes in the STTM165/166 mutant, which may contribute to the prolonged depolarization phenotype. Gene Ontology enrichment of differentially expressed genes (DEGs) identified associations between these DEGs and various stresses, including temperature, salt, and those related to the jasmonic acid and abscisic acid pathways. These findings provide experimental evidence for the use of plant electrical signals in characterizing stress tolerance and explore potential ion mechanisms through patch-clamp recording and DEG Gene Ontology analysis. They also emphasize the need for further research on the relationship between plant electrical signals and stress tolerance. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 20 pages, 5 figures

arXiv:2309.03802 [pdf, other]

The DESI One-Percent Survey: A concise model for galactic conformity of ELGs

Authors: Hongyu Gao, Y. P. **g, Kun Xu, Donghai Zhao, Shanquan Gui, Yun Zheng, Xiaolin Luo, Jessica Nicole Aguilar, Steven Ahlen, David Brooks, Todd Claybaugh, Shaun Cole, Axel de la Macorra, Jaime E. Forero-Romero, Satya Gontcho A Gontcho, Mustapha Ishak, Andrew Lambert, Martin Landriau, Marc Manera, Aaron Meisner, Ramon Miquel, Jundan Nie, Mehdi Rezaie, Graziano Rossi, Eusebio Sanchez , et al. (5 additional authors not shown)

Abstract: Galactic conformity is the phenomenon in which a galaxy of a certain physical property is correlated with its neighbors of the same property, implying a possible causal relationship. The observed auto correlations of emission line galaxies (ELGs) from the highly complete DESI One-Percent survey exhibit a strong clustering signal on small scales, providing clear evidence for the conformity effect o… ▽ More Galactic conformity is the phenomenon in which a galaxy of a certain physical property is correlated with its neighbors of the same property, implying a possible causal relationship. The observed auto correlations of emission line galaxies (ELGs) from the highly complete DESI One-Percent survey exhibit a strong clustering signal on small scales, providing clear evidence for the conformity effect of ELGs. Building upon the original subhalo abundance matching (SHAM) method developed by Gao et al. (2022, 2023), we propose a concise conformity model to improve the ELG-halo connection. In this model, the number of satellite ELGs is boosted by a factor of $\sim 5$ in the halos whose central galaxies are ELGs. We show that the mean ELG satellite number in such central halos is still smaller than 1, and the model does not significantly increase the overall satellite fraction. With this model, we can well recover the ELG auto correlations to the smallest scales explored with the current data (i.e. $r_{\mathrm{p}} > 0.03$ $\mathrm{Mpc}\,h^{-1}$ in real space and at $s > 0.3$ $\mathrm{Mpc}\,h^{-1}$ in redshift space), while the cross correlations between luminous red galaxies (LRGs) and ELGs are nearly unchanged. Although our SHAM model has only 8 parameters, we further verify that it can accurately describe the ELG clustering in the entire redshift range from $z = 0.8$ to $1.6$. We therefore expect that this method can be used to generate high-quality ELG lightcone mocks for DESI. △ Less

Submitted 7 November, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: 18 pages, 10 figures, accepted by ApJ

arXiv:2309.01808 [pdf, other]

DiscoverPath: A Knowledge Refinement and Retrieval System for Interdisciplinarity on Biomedical Research

Authors: Yu-Neng Chuang, Guanchu Wang, Chia-Yuan Chang, Kwei-Herng Lai, Daochen Zha, Ruixiang Tang, Fan Yang, Alfredo Costilla Reyes, Kaixiong Zhou, Xiaoqian Jiang, Xia Hu

Abstract: The exponential growth in scholarly publications necessitates advanced tools for efficient article retrieval, especially in interdisciplinary fields where diverse terminologies are used to describe similar research. Traditional keyword-based search engines often fall short in assisting users who may not be familiar with specific terminologies. To address this, we present a knowledge graph-based pa… ▽ More The exponential growth in scholarly publications necessitates advanced tools for efficient article retrieval, especially in interdisciplinary fields where diverse terminologies are used to describe similar research. Traditional keyword-based search engines often fall short in assisting users who may not be familiar with specific terminologies. To address this, we present a knowledge graph-based paper search engine for biomedical research to enhance the user experience in discovering relevant queries and articles. The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG. To reduce information overload, DiscoverPath presents users with a focused subgraph containing the queried entity and its neighboring nodes and incorporates a query recommendation system, enabling users to iteratively refine their queries. The system is equipped with an accessible Graphical User Interface that provides an intuitive visualization of the KG, query recommendations, and detailed article information, enabling efficient article retrieval, thus fostering interdisciplinary knowledge exploration. DiscoverPath is open-sourced at https://github.com/ynchuang/DiscoverPath. △ Less

Submitted 10 October, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

arXiv:2309.00711 [pdf, other]

Learning Shared Safety Constraints from Multi-task Demonstrations

Authors: Konwoo Kim, Gokul Swamy, Zuxin Liu, Ding Zhao, Sanjiban Choudhury, Zhiwei Steven Wu

Abstract: Regardless of the particular task we want them to perform in an environment, there are often shared safety constraints we want our agents to respect. For example, regardless of whether it is making a sandwich or clearing the table, a kitchen robot should not break a plate. Manually specifying such a constraint can be both time-consuming and error-prone. We show how to learn constraints from expert… ▽ More Regardless of the particular task we want them to perform in an environment, there are often shared safety constraints we want our agents to respect. For example, regardless of whether it is making a sandwich or clearing the table, a kitchen robot should not break a plate. Manually specifying such a constraint can be both time-consuming and error-prone. We show how to learn constraints from expert demonstrations of safe task completion by extending inverse reinforcement learning (IRL) techniques to the space of constraints. Intuitively, we learn constraints that forbid highly rewarding behavior that the expert could have taken but chose not to. Unfortunately, the constraint learning problem is rather ill-posed and typically leads to overly conservative constraints that forbid all behavior that the expert did not take. We counter this by leveraging diverse demonstrations that naturally occur in multi-task settings to learn a tighter set of constraints. We validate our method with simulation experiments on high-dimensional continuous control tasks. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2308.15254 [pdf, other]

A Generalized Density Dissipation for Weakly-compressible SPH

Authors: Bo Xue Zheng, Zhi Wen Cai, Pei Dong Zhao, Xiao Yang Xu, Tak Shing Chan, Peng Yu

Abstract: The weakly compressible Smoothed Particle Hydrodynamics (SPH) is known to suffer from the pressure oscillation, which would undermine the simulation stability and accuracy. To address this issue, we propose a generalized density dissipation scheme suitable for both single-phase and multiphase flow simulations. Our approach consists of two components. Firstly, we replace the basic density dissipati… ▽ More The weakly compressible Smoothed Particle Hydrodynamics (SPH) is known to suffer from the pressure oscillation, which would undermine the simulation stability and accuracy. To address this issue, we propose a generalized density dissipation scheme suitable for both single-phase and multiphase flow simulations. Our approach consists of two components. Firstly, we replace the basic density dissipation with the density increment dissipation to enable numerical dissipation crossing the interfaces of different fluids in multiphase flow. Secondly, based on the dissipation volume conservation, we utilize dissipation volume correction factor (VCF) to stabilize the simulations for multiphase flows with large density ratio. We demonstrate the accuracy, stability, and robustness of our method through four three-dimensional benchmarks, i.e., the sloshing under external excitations, the single and double bubbles rising, Rayleigh-Taylor instability, and Kelvin Helmholtz instability. Additionally, our study reveals the relationship between SPH with the density dissipation and the approximate Riemann solver. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.14838 [pdf, other]

doi 10.1145/3583780.3615071

Tackling Diverse Minorities in Imbalanced Classification

Authors: Kwei-Herng Lai, Daochen Zha, Huiyuan Chen, Mangesh Bendre, Yuzhong Chen, Mahashweta Das, Hao Yang, Xia Hu

Abstract: Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. When working with large datasets, the imbalanced issue can be further exacerbated, making it exceptionally difficult to train classifiers effectively. To address the problem, over-sampling techniques have been developed to linearly interpolating data instances be… ▽ More Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. When working with large datasets, the imbalanced issue can be further exacerbated, making it exceptionally difficult to train classifiers effectively. To address the problem, over-sampling techniques have been developed to linearly interpolating data instances between minorities and their neighbors. However, in many real-world scenarios such as anomaly detection, minority instances are often dispersed diversely in the feature space rather than clustered together. Inspired by domain-agnostic data mix-up, we propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. It is non-trivial to develop such a framework, the challenges include source sample selection, mix-up strategy selection, and the coordination between the underlying model and mix-up strategies. To tackle these challenges, we formulate the problem of iterative data mix-up as a Markov decision process (MDP) that maps data attributes onto an augmentation strategy. To solve the MDP, we employ an actor-critic framework to adapt the discrete-continuous decision space. This framework is utilized to train a data augmentation policy and design a reward signal that explores classifier uncertainty and encourages performance improvement, irrespective of the classifier's convergence. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets using three different types of classifiers. The results of these experiments showcase the potential and promise of our framework in addressing imbalanced datasets with diverse minorities. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.12578 [pdf, other]

Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models

Authors: Yachao Zhao, Bo Wang, Dongming Zhao, Kun Huang, Yan Wang, Ruifang He, Yuexian Hou

Abstract: Recent researches indicate that Pre-trained Large Language Models (LLMs) possess cognitive constructs similar to those observed in humans, prompting researchers to investigate the cognitive aspects of LLMs. This paper focuses on explicit and implicit social bias, a distinctive two-level cognitive construct in psychology. It posits that individuals' explicit social bias, which is their conscious ex… ▽ More Recent researches indicate that Pre-trained Large Language Models (LLMs) possess cognitive constructs similar to those observed in humans, prompting researchers to investigate the cognitive aspects of LLMs. This paper focuses on explicit and implicit social bias, a distinctive two-level cognitive construct in psychology. It posits that individuals' explicit social bias, which is their conscious expression of bias in the statements, may differ from their implicit social bias, which represents their unconscious bias. We propose a two-stage approach and discover a parallel phenomenon in LLMs known as "re-judge inconsistency" in social bias. In the initial stage, the LLM is tasked with automatically completing statements, potentially incorporating implicit social bias. However, in the subsequent stage, the same LLM re-judges the biased statement generated by itself but contradicts it. We propose that this re-judge inconsistency can be similar to the inconsistency between human's unaware implicit social bias and their aware explicit social bias. Experimental investigations on ChatGPT and GPT-4 concerning common gender biases examined in psychology corroborate the highly stable nature of the re-judge inconsistency. This finding may suggest that diverse cognitive constructs emerge as LLMs' capabilities strengthen. Consequently, leveraging psychological theories can provide enhanced insights into the underlying mechanisms governing the expressions of explicit and implicit constructs in LLMs. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.12063 [pdf, other]

Learning the Plasticity: Plasticity-Driven Learning Framework in Spiking Neural Networks

Authors: Guobin Shen, Dongcheng Zhao, Yiting Dong, Yang Li, Feifei Zhao, Yi Zeng

Abstract: The evolution of the human brain has led to the development of complex synaptic plasticity, enabling dynamic adaptation to a constantly evolving world. This progress inspires our exploration into a new paradigm for Spiking Neural Networks (SNNs): a Plasticity-Driven Learning Framework (PDLF). This paradigm diverges from traditional neural network models that primarily focus on direct training of s… ▽ More The evolution of the human brain has led to the development of complex synaptic plasticity, enabling dynamic adaptation to a constantly evolving world. This progress inspires our exploration into a new paradigm for Spiking Neural Networks (SNNs): a Plasticity-Driven Learning Framework (PDLF). This paradigm diverges from traditional neural network models that primarily focus on direct training of synaptic weights, leading to static connections that limit adaptability in dynamic environments. Instead, our approach delves into the heart of synaptic behavior, prioritizing the learning of plasticity rules themselves. This shift in focus from weight adjustment to mastering the intricacies of synaptic change offers a more flexible and dynamic pathway for neural networks to evolve and adapt. Our PDLF does not merely adapt existing concepts of functional and Presynaptic-Dependent Plasticity but redefines them, aligning closely with the dynamic and adaptive nature of biological learning. This reorientation enhances key cognitive abilities in artificial intelligence systems, such as working memory and multitasking capabilities, and demonstrates superior adaptability in complex, real-world scenarios. Moreover, our framework sheds light on the intricate relationships between various forms of plasticity and cognitive functions, thereby contributing to a deeper understanding of the brain's learning mechanisms. Integrating this groundbreaking plasticity-centric approach in SNNs marks a significant advancement in the fusion of neuroscience and artificial intelligence. It paves the way for develo** AI systems that not only learn but also adapt in an ever-changing world, much like the human brain. △ Less

Submitted 1 February, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.11237 [pdf, other]

Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation

Authors: Chao Ni, Xin Yin, Kaiwen Yang, Dehai Zhao, Zhenchang Xing, Xin Xia

Abstract: Though many deep learning (DL)-based vulnerability detection approaches have been proposed and indeed achieved remarkable performance, they still have limitations in the generalization as well as the practical usage. More precisely, existing DL-based approaches (1) perform negatively on prediction tasks among functions that are lexically similar but have contrary semantics; (2) provide no intuitiv… ▽ More Though many deep learning (DL)-based vulnerability detection approaches have been proposed and indeed achieved remarkable performance, they still have limitations in the generalization as well as the practical usage. More precisely, existing DL-based approaches (1) perform negatively on prediction tasks among functions that are lexically similar but have contrary semantics; (2) provide no intuitive developer-oriented explanations to the detected results. In this paper, we propose a novel approach named SVulD, a function-level Subtle semantic embedding for Vulnerability Detection along with intuitive explanations, to alleviate the above limitations. Specifically, SVulD firstly trains a model to learn distinguishing semantic representations of functions regardless of their lexical similarity. Then, for the detected vulnerable functions, SVulD provides natural language explanations (e.g., root cause) of results to help developers intuitively understand the vulnerabilities. To evaluate the effectiveness of SVulD, we conduct large-scale experiments on a widely used practical vulnerability dataset and compare it with four state-of-the-art (SOTA) approaches by considering five performance measures. The experimental results indicate that SVulD outperforms all SOTAs with a substantial improvement (i.e., 23.5%-68.0% in terms of F1-score, 15.9%-134.8% in terms of PR-AUC and 7.4%-64.4% in terms of Accuracy). Besides, we conduct a user-case study to evaluate the usefulness of SVulD for developers on understanding the vulnerable code and the participants' feedback demonstrates that SVulD is helpful for development practice. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: Accepted By FSE'23

arXiv:2308.10278 [pdf, other]

CharacterChat: Learning towards Conversational AI with Personalized Social Support

Authors: Quan Tu, Chuanqi Chen, **peng Li, Yanran Li, Shuo Shang, Dongyan Zhao, Ran Wang, Rui Yan

Abstract: In our modern, fast-paced, and interconnected world, the importance of mental well-being has grown into a matter of great urgency. However, traditional methods such as Emotional Support Conversations (ESC) face challenges in effectively addressing a diverse range of individual personalities. In response, we introduce the Social Support Conversation (S2Conv) framework. It comprises a series of supp… ▽ More In our modern, fast-paced, and interconnected world, the importance of mental well-being has grown into a matter of great urgency. However, traditional methods such as Emotional Support Conversations (ESC) face challenges in effectively addressing a diverse range of individual personalities. In response, we introduce the Social Support Conversation (S2Conv) framework. It comprises a series of support agents and the interpersonal matching mechanism, linking individuals with persona-compatible virtual supporters. Utilizing persona decomposition based on the MBTI (Myers-Briggs Type Indicator), we have created the MBTI-1024 Bank, a group that of virtual characters with distinct profiles. Through improved role-playing prompts with behavior preset and dynamic memory, we facilitate the development of the MBTI-S2Conv dataset, which contains conversations between the characters in the MBTI-1024 Bank. Building upon these foundations, we present CharacterChat, a comprehensive S2Conv system, which includes a conversational model driven by personas and memories, along with an interpersonal matching plugin model that dispatches the optimal supporters from the MBTI-1024 Bank for individuals with specific personas. Empirical results indicate the remarkable efficacy of CharacterChat in providing personalized social support and highlight the substantial advantages derived from interpersonal matching. The source code is available in \url{https://github.com/morecry/CharacterChat}. △ Less

Submitted 20 August, 2023; originally announced August 2023.

Comments: 10 pages, 6 figures, 5 tables

arXiv:2308.09351 [pdf, other]

RLIPv2: Fast Scaling of Relational Language-Image Pre-training

Authors: Hangjie Yuan, Shiwei Zhang, Xiang Wang, Samuel Albanie, Yining Pan, Tao Feng, Jianwen Jiang, Dong Ni, Yingya Zhang, Deli Zhao

Abstract: Relational Language-Image Pre-training (RLIP) aims to align vision representations with relational texts, thereby advancing the capability of relational reasoning in computer vision tasks. However, hindered by the slow convergence of RLIPv1 architecture and the limited availability of existing scene graph data, scaling RLIPv1 is challenging. In this paper, we propose RLIPv2, a fast converging mode… ▽ More Relational Language-Image Pre-training (RLIP) aims to align vision representations with relational texts, thereby advancing the capability of relational reasoning in computer vision tasks. However, hindered by the slow convergence of RLIPv1 architecture and the limited availability of existing scene graph data, scaling RLIPv1 is challenging. In this paper, we propose RLIPv2, a fast converging model that enables the scaling of relational pre-training to large-scale pseudo-labelled scene graph data. To enable fast scaling, RLIPv2 introduces Asymmetric Language-Image Fusion (ALIF), a mechanism that facilitates earlier and deeper gated cross-modal fusion with sparsified language encoding layers. ALIF leads to comparable or better performance than RLIPv1 in a fraction of the time for pre-training and fine-tuning. To obtain scene graph data at scale, we extend object detection datasets with free-form relation labels by introducing a captioner (e.g., BLIP) and a designed Relation Tagger. The Relation Tagger assigns BLIP-generated relation texts to region pairs, thus enabling larger-scale relational pre-training. Through extensive experiments conducted on Human-Object Interaction Detection and Scene Graph Generation, RLIPv2 shows state-of-the-art performance on three benchmarks under fully-finetuning, few-shot and zero-shot settings. Notably, the largest RLIPv2 achieves 23.29mAP on HICO-DET without any fine-tuning, yields 32.22mAP with just 1% data and yields 45.09mAP with 100% data. Code and models are publicly available at https://github.com/JacobYuan7/RLIPv2. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV 2023. Code and models: https://github.com/JacobYuan7/RLIPv2

arXiv:2308.07614 [pdf]

Single channel based interference-free and self-powered human-machine interactive interface using eigenfrequency-dominant mechanism

Authors: Sen Ding, Dazhe Zhao, Yongyao Chen, Ziyi Dai, Qian Zhao, Yibo Gao, Junwen Zhong, Jianyi Luo, Bingpu Zhou

Abstract: The recent development of wearable devices is revolutionizing the way of human-machine interaction (HMI). Nowadays, an interactive interface that carries more embedded information is desired to fulfil the increasing demand in era of Internet of Things. However, present approach normally relies on sensor arrays for memory expansion, which inevitably brings the concern of wiring complexity, signal d… ▽ More The recent development of wearable devices is revolutionizing the way of human-machine interaction (HMI). Nowadays, an interactive interface that carries more embedded information is desired to fulfil the increasing demand in era of Internet of Things. However, present approach normally relies on sensor arrays for memory expansion, which inevitably brings the concern of wiring complexity, signal differentiation, power consumption, and miniaturization. Herein, a one-channel based self-powered HMI interface, which uses the eigenfrequency of magnetized micropillar (MMP) as identification mechanism, is reported. When manually vibrated, the inherent recovery of the MMP caused a damped oscillation that generates current signals because of Faraday's Law of induction. The time-to-frequency conversion explores the MMP-related eigenfrequency, which provides a specific solution to allocate diverse commands in an interference-free behavior even with one electric channel. A cylindrical cantilever model was built to regulate the MMP eigenfrequencies via precisely designing the dimensional parameters and material properties. We show that using one device and two electrodes, high-capacity HMI interface can be realized when the MMPs with different eigenfrequencies have been integrated. This study provides the reference value to design the future HMI system especially for situations that require a more intuitive and intelligent communication experience with high-memory demand. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: 35 pages, 6 figures

Showing 151–200 of 1,041 results for author: Zha, D