Search | arXiv e-print repository

A Single-Loop Robust Policy Gradient Method for Robust Markov Decision Processes

Authors: Zhenwei Lin, Chenyu Xue, Qi Deng, Yinyu Ye

Abstract: Robust Markov Decision Processes (RMDPs) have recently been recognized as a valuable and promising approach to discovering a policy with creditable performance, particularly in the presence of a dynamic environment and estimation errors in the transition matrix due to limited data. Despite extensive exploration of dynamic programming algorithms for solving RMDPs, there has been a notable upswing i… ▽ More Robust Markov Decision Processes (RMDPs) have recently been recognized as a valuable and promising approach to discovering a policy with creditable performance, particularly in the presence of a dynamic environment and estimation errors in the transition matrix due to limited data. Despite extensive exploration of dynamic programming algorithms for solving RMDPs, there has been a notable upswing in interest in develo** efficient algorithms using the policy gradient method. In this paper, we propose the first single-loop robust policy gradient (SRPG) method with the global optimality guarantee for solving RMDPs through its minimax formulation. Moreover, we complement the convergence analysis of the nonconvex-nonconcave min-max optimization problem with the objective function's gradient dominance property, which is not explored in the prior literature. Numerical experiments validate the efficacy of SRPG, demonstrating its faster and more robust convergence behavior compared to its nested-loop counterpart. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2406.00115 [pdf, other]

Towards LLM-Powered Verilog RTL Assistant: Self-Verification and Self-Correction

Authors: Hanxian Huang, Zhenghan Lin, Zixuan Wang, Xin Chen, Ke Ding, Jishen Zhao

Abstract: We explore the use of Large Language Models (LLMs) to generate high-quality Register-Transfer Level (RTL) code with minimal human interference. The traditional RTL design workflow requires human experts to manually write high-quality RTL code, which is time-consuming and error-prone. With the help of emerging LLMs, developers can describe their requirements to LLMs which then generate correspondin… ▽ More We explore the use of Large Language Models (LLMs) to generate high-quality Register-Transfer Level (RTL) code with minimal human interference. The traditional RTL design workflow requires human experts to manually write high-quality RTL code, which is time-consuming and error-prone. With the help of emerging LLMs, developers can describe their requirements to LLMs which then generate corresponding code in Python, C, Java, and more. Adopting LLMs to generate RTL design in hardware description languages is not trivial, given the complex nature of hardware design and the generated design has to meet the timing and physical constraints. We propose VeriAssist, an LLM-powered programming assistant for Verilog RTL design workflow. VeriAssist takes RTL design descriptions as input and generates high-quality RTL code with corresponding test benches. VeriAssist enables the LLM to self-correct and self-verify the generated code by adopting an automatic prompting system and integrating RTL simulator in the code generation loop. To generate an RTL design, VeriAssist first generates the initial RTL code and corresponding test benches, followed by a self-verification step that walks through the code with test cases to reason the code behavior at different time steps, and finally it self-corrects the code by reading the compilation and simulation results and generating final RTL code that fixes errors in compilation and simulation. This design fully leverages the LLMs' capabilities on multi-turn interaction and chain-of-thought reasoning to improve the quality of the generated code. We evaluate VeriAssist with various benchmark suites and find it significantly improves both syntax and functionality correctness over existing LLM implementations, thus minimizing human intervention and making RTL design more accessible to novice designers. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.20584 [pdf, other]

Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization

Authors: Yisu Liu, **yang An, Wanqian Zhang, Dayan Wu, **gzi Gu, Zheng Lin, Wei** Wang

Abstract: With the development of diffusion-based customization methods like DreamBooth, individuals now have access to train the models that can generate their personalized images. Despite the convenience, malicious users have misused these techniques to create fake images, thereby triggering a privacy security crisis. In light of this, proactive adversarial attacks are proposed to protect users against cu… ▽ More With the development of diffusion-based customization methods like DreamBooth, individuals now have access to train the models that can generate their personalized images. Despite the convenience, malicious users have misused these techniques to create fake images, thereby triggering a privacy security crisis. In light of this, proactive adversarial attacks are proposed to protect users against customization. The adversarial examples are trained to distort the customization model's outputs and thus block the misuse. In this paper, we propose DisDiff (Disrupting Diffusion), a novel adversarial attack method to disrupt the diffusion model outputs. We first delve into the intrinsic image-text relationships, well-known as cross-attention, and empirically find that the subject-identifier token plays an important role in guiding image generation. Thus, we propose the Cross-Attention Erasure module to explicitly "erase" the indicated attention maps and disrupt the text guidance. Besides,we analyze the influence of the sampling process of the diffusion model on Projected Gradient Descent (PGD) attack and introduce a novel Merit Sampling Scheduler to adaptively modulate the perturbation updating amplitude in a step-aware manner. Our DisDiff outperforms the state-of-the-art methods by 12.75% of FDFR scores and 7.25% of ISM scores across two facial benchmarks and two commonly used prompts on average. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Under review

ACM Class: I.2.10

arXiv:2405.20320 [pdf, other]

Improving the Training of Rectified Flows

Authors: Sangyun Lee, Zinan Lin, Giulia Fanti

Abstract: Diffusion models have shown great promise for image and video generation, but sampling from state-of-the-art models requires expensive numerical integration of a generative ODE. One approach for tackling this problem is rectified flows, which iteratively learn smooth ODE paths that are less susceptible to truncation error. However, rectified flows still require a relatively large number of functio… ▽ More Diffusion models have shown great promise for image and video generation, but sampling from state-of-the-art models requires expensive numerical integration of a generative ODE. One approach for tackling this problem is rectified flows, which iteratively learn smooth ODE paths that are less susceptible to truncation error. However, rectified flows still require a relatively large number of function evaluations (NFEs). In this work, we propose improved techniques for training rectified flows, allowing them to compete with knowledge distillation methods even in the low NFE setting. Our main insight is that under realistic settings, a single iteration of the Reflow algorithm for training rectified flows is sufficient to learn nearly straight trajectories; hence, the current practice of using multiple Reflow iterations is unnecessary. We thus propose techniques to improve one-round training of rectified flows, including a U-shaped timestep distribution and LPIPS-Huber premetric. With these techniques, we improve the FID of the previous 2-rectified flow by up to 72% in the 1 NFE setting on CIFAR-10. On ImageNet 64$\times$64, our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation and progressive distillation in both one-step and two-step settings and rivals the performance of improved consistency training (iCT) in FID. Code is available at https://github.com/sangyun884/rfpp. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19764 [pdf, other]

Macro-scale roughness reveals the complex history of asteroids Didymos and Dimorphos

Authors: Jean-Baptiste Vincent, Erik Asphaug, Olivier Barnouin, Joel Beccarelli, Paula G. Benavidez, Adriano Campo-Bagatin, Nancy L. Chabot, Carolyn M. Ernst, Pedro H. Hasselmann, Masatoshi Hirabayashi, Simone Ieva, Ozgur Karatekin, Tomas Kasparek, Tomas Kohout, Zhong-Yi Lin, Alice Lucchetti, Patrick Michel, Naomi Murdoch, Maurizio Pajola, Laura M. Parro, Sabina D. Raducan, Jessica Sunshine, Gonzalo Tancredi, Josep M. Trigo-Rodriguez, Angelo Zinzi

Abstract: Morphological map** is a fundamental step in studying the processes that shaped an asteroid surface. Yet, it is challenging and often requires multiple independent assessments by trained experts. Here, we present fast methods to detect and characterize meaningful terrains from the topographic roughness: entropy of information, and local mean surface orientation. We apply our techniques to Didymo… ▽ More Morphological map** is a fundamental step in studying the processes that shaped an asteroid surface. Yet, it is challenging and often requires multiple independent assessments by trained experts. Here, we present fast methods to detect and characterize meaningful terrains from the topographic roughness: entropy of information, and local mean surface orientation. We apply our techniques to Didymos and Dimorphos, the target asteroids of NASA's DART mission: first attempt to deflect an asteroid. Our methods reliably identify morphological units at multiple scales. The comparative study reveals various terrain types, signatures of processes that transformed Didymos and Dimorphos. Didymos shows the most heterogeneity and morphology that indicate recent resurfacing events. Dimorphos is comparatively rougher than Didymos, which may result from the formation process of the binary pair and past interaction between the two bodies. Our methods can be readily applied to other bodies and data sets. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: submitted to PSJ

arXiv:2405.18810 [pdf, other]

UniPTS: A Unified Framework for Proficient Post-Training Sparsity

Authors: **g**g Xie, Yuxin Zhang, Mingbao Lin, Zhihang Lin, Liujuan Cao, Rongrong Ji

Abstract: Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three… ▽ More Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS. Our endeavors particularly comprise (1) A base-decayed sparsity objective that promotes efficient knowledge transferring from dense network to the sparse counterpart. (2) A reducing-regrowing search algorithm designed to ascertain the optimal sparsity distribution while circumventing overfitting to the small calibration set in PTS. (3) The employment of dynamic sparse training predicated on the preceding aspects, aimed at comprehensively optimizing the sparsity structure while ensuring training stability. Our proposed framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks. As an illustration, it amplifies the performance of POT, a recently proposed recipe, from 3.9% to 68.6% when pruning ResNet-50 at 90% sparsity ratio on ImageNet. We release the code of our paper at https://github.com/xjjxmu/UniPTS. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted by CVPR2024

arXiv:2405.18800 [pdf]

Face processing emerges from object-trained convolutional neural networks

Authors: Zhenhua Zhao, Ji Chen, Zhicheng Lin, Haojiang Ying

Abstract: Whether face processing depends on unique, domain-specific neurocognitive mechanisms or domain-general object recognition mechanisms has long been debated. Directly testing these competing hypotheses in humans has proven challenging due to extensive exposure to both faces and objects. Here, we systematically test these hypotheses by capitalizing on recent progress in convolutional neural networks… ▽ More Whether face processing depends on unique, domain-specific neurocognitive mechanisms or domain-general object recognition mechanisms has long been debated. Directly testing these competing hypotheses in humans has proven challenging due to extensive exposure to both faces and objects. Here, we systematically test these hypotheses by capitalizing on recent progress in convolutional neural networks (CNNs) that can be trained without face exposure (i.e., pre-trained weights). Domain-general mechanism accounts posit that face processing can emerge from a neural network without specialized pre-training on faces. Consequently, we trained CNNs solely on objects and tested their ability to recognize and represent faces as well as objects that look like faces (face pareidolia stimuli).... Due to the character limits, for more details see in attached pdf △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 31 pages, 5 Figures

arXiv:2405.17873 [pdf, other]

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Authors: Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Shengen Yan, Guohao Dai, Yu Wang

Abstract: Diffusion models have achieved significant visual generation quality. However, their significant computational and memory costs pose challenge for their application on resource-constrained mobile devices or even desktop GPUs. Recent few-step diffusion models reduces the inference time by reducing the denoising steps. However, their memory consumptions are still excessive. The Post Training Quantiz… ▽ More Diffusion models have achieved significant visual generation quality. However, their significant computational and memory costs pose challenge for their application on resource-constrained mobile devices or even desktop GPUs. Recent few-step diffusion models reduces the inference time by reducing the denoising steps. However, their memory consumptions are still excessive. The Post Training Quantization (PTQ) replaces high bit-width FP representation with low-bit integer values (INT4/8) , which is an effective and efficient technique to reduce the memory cost. However, when applying to few-step diffusion models, existing quantization methods face challenges in preserving both the image quality and text alignment. To address this issue, we propose an mixed-precision quantization framework - MixDQ. Firstly, We design specialized BOS-aware quantization method for highly sensitive text embedding quantization. Then, we conduct metric-decoupled sensitivity analysis to measure the sensitivity of each layer. Finally, we develop an integer-programming-based method to conduct bit-width allocation. While existing quantization methods fall short at W8A8, MixDQ could achieve W8A8 without performance loss, and W4A8 with negligible visual degradation. Compared with FP16, we achieve 3-4x reduction in model size and memory cost, and 1.45x latency speedup. △ Less

Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: Project Page: https://a-suozhang.xyz/mixdq.github.io/

arXiv:2405.17764 [pdf, other]

On the Sequence Evaluation based on Stochastic Processes

Authors: Tianhao Zhang, Zhexiao Lin, Zhecheng Sheng, Chen Jiang, Dongyeop Kang

Abstract: Modeling and analyzing long sequences of text is an essential task for Natural Language Processing. Success in capturing long text dynamics using neural language models will facilitate many downstream tasks such as coherence evaluation, text generation, machine translation and so on. This paper presents a novel approach to model sequences through a stochastic process. We introduce a likelihood-bas… ▽ More Modeling and analyzing long sequences of text is an essential task for Natural Language Processing. Success in capturing long text dynamics using neural language models will facilitate many downstream tasks such as coherence evaluation, text generation, machine translation and so on. This paper presents a novel approach to model sequences through a stochastic process. We introduce a likelihood-based training objective for the text encoder and design a more thorough measurement (score) for long text evaluation compared to the previous approach. The proposed training objective effectively preserves the sequence coherence, while the new score comprehensively captures both temporal and spatial dependencies. Theoretical properties of our new score show its advantages in sequence evaluation. Experimental results show superior performance in various sequence evaluation tasks, including global and local discrimination within and between documents of different lengths. We also demonstrate the encoder achieves competitive results on discriminating human and AI written text. △ Less

Submitted 15 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17247 [pdf, other]

An Introduction to Vision-Language Modeling

Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technology. However, there are many challenges that need to be addressed to improve the reliability of those models. While language is discrete, vision evolves in a much higher dimensional space in which concepts cannot always be easily discretized. To better understand the mechanics behind map** vision to language, we present this introduction to VLMs which we hope will help anyone who would like to enter the field. First, we introduce what VLMs are, how they work, and how to train them. Then, we present and discuss approaches to evaluate VLMs. Although this work primarily focuses on map** images to language, we also discuss extending VLMs to videos. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17053 [pdf, other]

WirelessLLM: Empowering Large Language Models Towards Wireless Intelligence

Authors: Jiawei Shao, **gwen Tong, Qiong Wu, Wei Guo, Zijian Li, Zehong Lin, Jun Zhang

Abstract: The rapid evolution of wireless technologies and the growing complexity of network infrastructures necessitate a paradigm shift in how communication networks are designed, configured, and managed. Recent advancements in Large Language Models (LLMs) have sparked interest in their potential to revolutionize wireless communication systems. However, existing studies on LLMs for wireless systems are li… ▽ More The rapid evolution of wireless technologies and the growing complexity of network infrastructures necessitate a paradigm shift in how communication networks are designed, configured, and managed. Recent advancements in Large Language Models (LLMs) have sparked interest in their potential to revolutionize wireless communication systems. However, existing studies on LLMs for wireless systems are limited to a direct application for telecom language understanding. To empower LLMs with knowledge and expertise in the wireless domain, this paper proposes WirelessLLM, a comprehensive framework for adapting and enhancing LLMs to address the unique challenges and requirements of wireless communication networks. We first identify three foundational principles that underpin WirelessLLM: knowledge alignment, knowledge fusion, and knowledge evolution. Then, we investigate the enabling technologies to build WirelessLLM, including prompt engineering, retrieval augmented generation, tool usage, multi-modal pre-training, and domain-specific fine-tuning. Moreover, we present three case studies to demonstrate the practical applicability and benefits of WirelessLLM for solving typical problems in wireless networks. Finally, we conclude this paper by highlighting key challenges and outlining potential avenues for future research. △ Less

Submitted 15 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16851 [pdf, other]

Temporal Spiking Neural Networks with Synaptic Delay for Graph Reasoning

Authors: Mingqing Xiao, Yixin Zhu, Di He, Zhouchen Lin

Abstract: Spiking neural networks (SNNs) are investigated as biologically inspired models of neural computation, distinguished by their computational capability and energy efficiency due to precise spiking times and sparse spikes with event-driven computation. A significant question is how SNNs can emulate human-like graph-based reasoning of concepts and relations, especially leveraging the temporal domain… ▽ More Spiking neural networks (SNNs) are investigated as biologically inspired models of neural computation, distinguished by their computational capability and energy efficiency due to precise spiking times and sparse spikes with event-driven computation. A significant question is how SNNs can emulate human-like graph-based reasoning of concepts and relations, especially leveraging the temporal domain optimally. This paper reveals that SNNs, when amalgamated with synaptic delay and temporal coding, are proficient in executing (knowledge) graph reasoning. It is elucidated that spiking time can function as an additional dimension to encode relation properties via a neural-generalized path formulation. Empirical results highlight the efficacy of temporal delay in relation processing and showcase exemplary performance in diverse graph reasoning tasks. The spiking model is theoretically estimated to achieve $20\times$ energy savings compared to non-spiking counterparts, deepening insights into the capabilities and potential of biologically inspired SNNs for efficient reasoning. The code is available at https://github.com/pkuxmq/GRSNN. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Accepted by ICML 2024

arXiv:2405.16618 [pdf, other]

An efficient optimization model and tabu search-based global optimization approach for continuous p-dispersion problem

Authors: Xiang**g Lai, Zhenheng Lin, **-Kao Hao, Qinghua Wu

Abstract: Continuous p-dispersion problems with and without boundary constraints are NP-hard optimization problems with numerous real-world applications, notably in facility location and circle packing, which are widely studied in mathematics and operations research. In this work, we concentrate on general cases with a non-convex multiply-connected region that are rarely studied in the literature due to the… ▽ More Continuous p-dispersion problems with and without boundary constraints are NP-hard optimization problems with numerous real-world applications, notably in facility location and circle packing, which are widely studied in mathematics and operations research. In this work, we concentrate on general cases with a non-convex multiply-connected region that are rarely studied in the literature due to their intractability and the absence of an efficient optimization model. Using the penalty function approach, we design a unified and almost everywhere differentiable optimization model for these complex problems and propose a tabu search-based global optimization (TSGO) algorithm for solving them. Computational results over a variety of benchmark instances show that the proposed model works very well, allowing popular local optimization methods (e.g., the quasi-Newton methods and the conjugate gradient methods) to reach high-precision solutions due to the differentiability of the model. These results further demonstrate that the proposed TSGO algorithm is very efficient and significantly outperforms several popular global optimization algorithms in the literature, improving the best-known solutions for several existing instances in a short computational time. Experimental analyses are conducted to show the influence of several key ingredients of the algorithm on computational performance. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16256 [pdf, other]

HetHub: A Heterogeneous distributed hybrid training system for large-scale models

Authors: Si Xu, Zixiao Huang, Yan Zeng, Shengen Yan, Xuefei Ning, Haolin Ye, Sipei Gu, Chunsheng Shui, Zhezheng Lin, Hao Zhang, Sheng Wang, Guohao Dai, Yu Wang

Abstract: The development of large-scale models relies on a vast number of computing resources. For example, the GPT-4 model (1.8 trillion parameters) requires 25000 A100 GPUs for its training. It is a challenge to build a large-scale cluster with a type of GPU-accelerator. Using multiple types of GPU-accelerators to construct a cluster is an effective way to solve the problem of insufficient homogeneous GP… ▽ More The development of large-scale models relies on a vast number of computing resources. For example, the GPT-4 model (1.8 trillion parameters) requires 25000 A100 GPUs for its training. It is a challenge to build a large-scale cluster with a type of GPU-accelerator. Using multiple types of GPU-accelerators to construct a cluster is an effective way to solve the problem of insufficient homogeneous GPU-accelerators. However, the existing distributed training systems for large-scale models only support homogeneous GPU-accelerators, not heterogeneous GPU-accelerators. To address the problem, this paper proposes a distributed training system with hybrid parallelism support on heterogeneous GPU-accelerators for large-scale models. It introduces a distributed unified communicator to realize the communication between heterogeneous GPU-accelerators, a distributed performance predictor, and an automatic hybrid parallel module to develop and train models efficiently with heterogeneous GPU-accelerators. Compared to the distributed training system with homogeneous GPU-accelerators, our system can support six different combinations of heterogeneous GPU-accelerators and the optimal performance of heterogeneous GPU-accelerators has achieved at least 90% of the theoretical upper bound performance of homogeneous GPU-accelerators. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.16041 [pdf, other]

Explainable Molecular Property Prediction: Aligning Chemical Concepts with Predictions via Language Models

Authors: Zhenzhong Wang, Zehui Lin, Wanyu Lin, Ming Yang, Minggang Zeng, Kay Chen Tan

Abstract: Providing explainable molecule property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based language models have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop… ▽ More Providing explainable molecule property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based language models have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop a new framework for explainable molecular property prediction based on language models, dubbed as Lamole, which can provide chemical concepts-aligned explanations. We first leverage a designated molecular representation -- the Group SELFIES -- as it can provide chemically meaningful semantics. Because attention mechanisms in Transformers can inherently capture relationships within the input, we further incorporate the attention weights and gradients together to generate explanations for capturing the functional group interactions. We then carefully craft a marginal loss to explicitly optimize the explanations to be able to align with the chemists' annotations. We bridge the manifold hypothesis with the elaborated marginal loss to prove that the loss can align the explanations with the tangent space of the data manifold, leading to concept-aligned explanations. Experimental results over six mutagenicity datasets and one hepatotoxicity dataset demonstrate Lamole can achieve comparable classification accuracy and boost the explanation accuracy by up to 14.8%, being the state-of-the-art in explainable molecular property prediction. △ Less

Submitted 31 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15705 [pdf, other]

Sums: Sniffing Unknown Multiband Signals under Low Sampling Rates

Authors: **bo Peng, Zhe Chen, Zheng Lin, Haoxuan Yuan, Zihan Fang, Lingzhong Bao, Zihang Song, Ying Li, **g Ren, Yue Gao

Abstract: Due to sophisticated deployments of all kinds of wireless networks (e.g., 5G, Wi-Fi, Bluetooth, LEO satellite, etc.), multiband signals distribute in a large bandwidth (e.g., from 70 MHz to 8 GHz). Consequently, for network monitoring and spectrum sharing applications, a sniffer for extracting physical layer information, such as structure of packet, with low sampling rate (especially, sub-Nyquist… ▽ More Due to sophisticated deployments of all kinds of wireless networks (e.g., 5G, Wi-Fi, Bluetooth, LEO satellite, etc.), multiband signals distribute in a large bandwidth (e.g., from 70 MHz to 8 GHz). Consequently, for network monitoring and spectrum sharing applications, a sniffer for extracting physical layer information, such as structure of packet, with low sampling rate (especially, sub-Nyquist sampling) can significantly improve their cost- and energy-efficiency. However, to achieve a multiband signals sniffer is really a challenge. To this end, we propose Sums, a system that can sniff and analyze multiband signals in a blind manner. Our Sums takes advantage of hardware and algorithm co-design, multi-coset sub-Nyquist sampling hardware, and a multi-task deep learning framework. The hardware component breaks the Nyquist rule to sample GHz bandwidth, but only pays for a 50 MSPS sampling rate. Our multi-task learning framework directly tackles the sampling data to perform spectrum sensing, physical layer protocol recognition, and demodulation for deep inspection from multiband signals. Extensive experiments demonstrate that Sums achieves higher accuracy than the state-of-theart baselines in spectrum sensing, modulation classification, and demodulation. As a result, our Sums can help researchers and end-users to diagnose or troubleshoot their problems of wireless infrastructures deployments in practice. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 12 pages, 9 figures

arXiv:2405.15542 [pdf, other]

SATSense: Multi-Satellite Collaborative Framework for Spectrum Sensing

Authors: Haoxuan Yuan, Zhe Chen, Zheng Lin, **bo Peng, Zihan Fang, Yuhang Zhong, Zihang Song, Yue Gao

Abstract: Low Earth Orbit satellite Internet has recently been deployed, providing worldwide service with non-terrestrial networks. With the large-scale deployment of both non-terrestrial and terrestrial networks, limited spectrum resources will not be allocated enough. Consequently, dynamic spectrum sharing is crucial for their coexistence in the same spectrum, where accurate spectrum sensing is essential.… ▽ More Low Earth Orbit satellite Internet has recently been deployed, providing worldwide service with non-terrestrial networks. With the large-scale deployment of both non-terrestrial and terrestrial networks, limited spectrum resources will not be allocated enough. Consequently, dynamic spectrum sharing is crucial for their coexistence in the same spectrum, where accurate spectrum sensing is essential. However, spectrum sensing in space is more challenging than in terrestrial networks due to variable channel conditions, making single-satellite sensing unstable. Therefore, we first attempt to design a collaborative sensing scheme utilizing diverse data from multiple satellites. However, it is non-trivial to achieve this collaboration due to heterogeneous channel quality, considerable raw sampling data, and packet loss. To address the above challenges, we first establish connections between the satellites by modeling their sensing data as a graph and devising a graph neural network-based algorithm to achieve effective spectrum sensing. Meanwhile, we establish a joint sub-Nyquist sampling and autoencoder data compression framework to reduce the amount of transmitted sensing data. Finally, we propose a contrastive learning-based mechanism compensates for missing packets. Extensive experiments demonstrate that our proposed strategy can achieve efficient spectrum sensing performance and outperform the conventional deep learning algorithm in spectrum sensing accuracy. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 13 pages, 16 figures

arXiv:2405.15208 [pdf, other]

Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs

Authors: Chenxi Sun, Hongzhi Zhang, Zijia Lin, **gyuan Zhang, Fuzheng Zhang, Zhongyuan Wang, Bin Chen, Chengru Song, Di Zhang, Kun Gai, Deyi Xiong

Abstract: Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for real-time applications. This paper introduces Lexical Unit Decoding (LUD), a novel decoding methodology implemented in a data-driven manner, accelerating the d… ▽ More Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for real-time applications. This paper introduces Lexical Unit Decoding (LUD), a novel decoding methodology implemented in a data-driven manner, accelerating the decoding process without sacrificing output quality. The core of our approach is the observation that a pre-trained language model can confidently predict multiple contiguous tokens, forming the basis for a \textit{lexical unit}, in which these contiguous tokens could be decoded in parallel. Extensive experiments validate that our method substantially reduces decoding time while maintaining generation quality, i.e., 33\% speed up on natural language generation with no quality loss, and 30\% speed up on code generation with a negligible quality loss of 3\%. Distinctively, LUD requires no auxiliary models and does not require changes to existing architectures. It can also be integrated with other decoding acceleration methods, thus achieving an even more pronounced inference efficiency boost. We posit that the foundational principles of LUD could define a new decoding paradigm for future language models, enhancing their applicability for a broader spectrum of applications. All codes are be publicly available at https://github.com/tjunlp-lab/Lexical-Unit-Decoding-LUD-. Keywords: Parallel Decoding, Lexical Unit Decoding, Large Language Model △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted for publication at LREC-COLING 2024

arXiv:2405.14854 [pdf, other]

TerDiT: Ternary Diffusion Models with Transformers

Authors: Xudong Lu, Aojun Zhou, Ziyi Lin, Qi Liu, Yuhui Xu, Renrui Zhang, Yafei Wen, Shuai Ren, Peng Gao, Junchi Yan, Hongsheng Li

Abstract: Recent developments in large-scale pre-trained text-to-image diffusion models have significantly improved the generation of high-fidelity images, particularly with the emergence of diffusion models based on transformer architecture (DiTs). Among these diffusion models, diffusion transformers have demonstrated superior image generation capabilities, boosting lower FID scores and higher scalability.… ▽ More Recent developments in large-scale pre-trained text-to-image diffusion models have significantly improved the generation of high-fidelity images, particularly with the emergence of diffusion models based on transformer architecture (DiTs). Among these diffusion models, diffusion transformers have demonstrated superior image generation capabilities, boosting lower FID scores and higher scalability. However, deploying large-scale DiT models can be expensive due to their extensive parameter numbers. Although existing research has explored efficient deployment techniques for diffusion models such as model quantization, there is still little work concerning DiT-based models. To tackle this research gap, in this paper, we propose TerDiT, a quantization-aware training (QAT) and efficient deployment scheme for ternary diffusion models with transformers. We focus on the ternarization of DiT networks and scale model sizes from 600M to 4.2B. Our work contributes to the exploration of efficient deployment strategies for large-scale DiT models, demonstrating the feasibility of training extremely low-bit diffusion transformer models from scratch while maintaining competitive image generation capacities compared to full-precision models. Code will be available at https://github.com/Lucky-Lance/TerDiT. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 18 pages, 13 figures

arXiv:2405.14458 [pdf, other]

YOLOv10: Real-Time End-to-End Object Detection

Authors: Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, Guiguang Ding

Abstract: Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum sup… ▽ More Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs and adversely impacts the inference latency. Besides, the design of various components in YOLOs lacks the comprehensive and thorough inspection, resulting in noticeable computational redundancy and limiting the model's capability. It renders the suboptimal efficiency, along with considerable potential for performance improvements. In this work, we aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture. To this end, we first present the consistent dual assignments for NMS-free training of YOLOs, which brings competitive performance and low inference latency simultaneously. Moreover, we introduce the holistic efficiency-accuracy driven model design strategy for YOLOs. We comprehensively optimize various components of YOLOs from both efficiency and accuracy perspectives, which greatly reduces the computational overhead and enhances the capability. The outcome of our effort is a new generation of YOLO series for real-time end-to-end object detection, dubbed YOLOv10. Extensive experiments show that YOLOv10 achieves state-of-the-art performance and efficiency across various model scales. For example, our YOLOv10-S is 1.8$\times$ faster than RT-DETR-R18 under the similar AP on COCO, meanwhile enjoying 2.8$\times$ smaller number of parameters and FLOPs. Compared with YOLOv9-C, YOLOv10-B has 46\% less latency and 25\% fewer parameters for the same performance. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Code: https://github.com/THU-MIG/yolov10

arXiv:2405.13582 [pdf, other]

Dual-Capability Machine Learning Models for Quantum Hamiltonian Parameter Estimation and Dynamics Prediction

Authors: Zheng An, Jiahui Wu, Zidong Lin, Xiaobo Yang, Keren Li, Bei Zeng

Abstract: Recent advancements in quantum hardware and classical computing simulations have significantly enhanced the accessibility of quantum system data, leading to an increased demand for precise descriptions and predictions of these systems. Accurate prediction of quantum Hamiltonian dynamics and identification of Hamiltonian parameters are crucial for advancements in quantum simulations, error correcti… ▽ More Recent advancements in quantum hardware and classical computing simulations have significantly enhanced the accessibility of quantum system data, leading to an increased demand for precise descriptions and predictions of these systems. Accurate prediction of quantum Hamiltonian dynamics and identification of Hamiltonian parameters are crucial for advancements in quantum simulations, error correction, and control protocols. This study introduces a machine learning model with dual capabilities: it can deduce time-dependent Hamiltonian parameters from observed changes in local observables within quantum many-body systems, and it can predict the evolution of these observables based on Hamiltonian parameters. Our model's validity was confirmed through theoretical simulations across various scenarios and further validated by two experiments. Initially, the model was applied to a Nuclear Magnetic Resonance quantum computer, where it accurately predicted the dynamics of local observables. The model was then tested on a superconducting quantum computer with initially unknown Hamiltonian parameters, successfully inferring them. Our approach aims to enhance various quantum computing tasks, including parameter estimation, noise characterization, feedback processes, and quantum control optimization. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 19 pages, 14 figures

arXiv:2405.12827 [pdf, other]

On an impulsive faecal-oral model in a periodically evolving environment

Authors: Qi Zhou, Zhigui Lin, Carlos Alberto Santos

Abstract: To understand how impulsive intervention and regional evolution jointly influence the spread of faecal-oral diseases, this paper develops an impulsive faecal-oral model in a periodically evolving environment. The well-posedness of the model is first checked. Then, the existence of the principal eigenvalue dependent on impulse intensity and evolving rate is proved based on Krein-Rutman theorem. Wit… ▽ More To understand how impulsive intervention and regional evolution jointly influence the spread of faecal-oral diseases, this paper develops an impulsive faecal-oral model in a periodically evolving environment. The well-posedness of the model is first checked. Then, the existence of the principal eigenvalue dependent on impulse intensity and evolving rate is proved based on Krein-Rutman theorem. With the help of this value, the threshold dynamical behaviours of the model are explored. More importantly, this paper also derives the monotonicity of the principal eigenvalue with respect to initial region and impulse intensity and estimates the principal eigenvalue in some special cases. Finally, numerical simulations are used to verify the correctness of the theoretical results and to explore the impact of regional evolution rate on the spread of the diseases. Our research shows that large impulsive intensity $1-g'(0)$ and small evolving rate $ρ(t)$ play a positive role in the prevention and control of the diseases. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 29 pages, 8 figures

MSC Class: 35K57; 35R12; 92B05

arXiv:2405.12706 [pdf, other]

Disentangled Representation with Cross Experts Covariance Loss for Multi-Domain Recommendation

Authors: Zhutian Lin, Junwei Pan, Haibin Yu, Xi Xiao, Ximei Wang, Zhixiang Feng, Shifeng Wen, Shudong Huang, Lei Xiao, Jie Jiang

Abstract: Multi-domain learning (MDL) has emerged as a prominent research area aimed at enhancing the quality of personalized services. The key challenge in MDL lies in striking a balance between learning commonalities across domains while preserving the distinct characteristics of each domain. However, this gives rise to a challenging dilemma. On one hand, a model needs to leverage domain-specific modules,… ▽ More Multi-domain learning (MDL) has emerged as a prominent research area aimed at enhancing the quality of personalized services. The key challenge in MDL lies in striking a balance between learning commonalities across domains while preserving the distinct characteristics of each domain. However, this gives rise to a challenging dilemma. On one hand, a model needs to leverage domain-specific modules, such as experts or embeddings, to preserve the uniqueness of each domain. On the other hand, due to the long-tailed distributions observed in real-world domains, some tail domains may lack sufficient samples to fully learn their corresponding modules. Unfortunately, existing approaches have not adequately addressed this dilemma. To address this issue, we propose a novel model called Crocodile, which stands for Cross-experts Covariance Loss for Disentangled Learning. Crocodile adopts a multi-embedding paradigm to facilitate model learning and employs a Covariance Loss on these embeddings to disentangle them. This disentanglement enables the model to capture diverse user interests across domains effectively. Additionally, we introduce a novel gating mechanism to further enhance the capabilities of Crocodile. Through empirical analysis, we demonstrate that our proposed method successfully resolves these two challenges and outperforms all state-of-the-art methods on publicly available datasets. We firmly believe that the analytical perspectives and design concept of disentanglement presented in our work can pave the way for future research in the field of MDL. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.12584 [pdf, other]

Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?

Authors: Ziqin Lin, Heng Li, Zinan Li, Huazhu Fu, Jiang Liu

Abstract: Recent advancements in pre-trained large foundation models (LFM) have yielded significant breakthroughs across various domains, including natural language processing and computer vision. These models have been particularly impactful in the domain of medical diagnostic tasks. With abundant unlabeled data, an LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supe… ▽ More Recent advancements in pre-trained large foundation models (LFM) have yielded significant breakthroughs across various domains, including natural language processing and computer vision. These models have been particularly impactful in the domain of medical diagnostic tasks. With abundant unlabeled data, an LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supervised learning framework. This LFM has shown promising performance in fundus disease diagnosis across multiple datasets. On the other hand, deep learning models have long been challenged by dataset quality issues, such as image quality and dataset bias. To investigate the influence of data quality on LFM, we conducted explorations in two fundus diagnosis tasks using datasets of varying quality. Specifically, we explored the following questions: Is LFM more robust to image quality? Is LFM affected by dataset bias? Can fine-tuning techniques alleviate these effects? Our investigation found that LFM exhibits greater resilience to dataset quality issues, including image quality and dataset bias, compared to typical convolutional networks. Furthermore, we discovered that overall fine-tuning is an effective adapter for LFM to mitigate the impact of dataset quality issues. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 10 pages, 6 figures

arXiv:2405.10895 [pdf, other]

The unluckiest star: A spectroscopically confirmed repeated partial tidal disruption event AT 2022dbl

Authors: Zheyu Lin, Ning Jiang, Tinggui Wang, Xu Kong, Dongyue Li, Han He, Yibo Wang, Jiazheng Zhu, Wentao Li, Ji-an Jiang, Avinash Singh, Rishabh Singh Teja, D. K. Sahu, Chichuan **, Keiichi Maeda, Shifeng Huang

Abstract: The unluckiest star orbits a supermassive black hole elliptically. Every time it reaches the pericenter, it shallowly enters the tidal radius and gets partially tidal disrupted, producing a series of flares. Confirmation of a repeated partial tidal disruption event (pTDE) requires not only evidence to rule out other types of transients, but also proof that only one star is involved, as TDEs from m… ▽ More The unluckiest star orbits a supermassive black hole elliptically. Every time it reaches the pericenter, it shallowly enters the tidal radius and gets partially tidal disrupted, producing a series of flares. Confirmation of a repeated partial tidal disruption event (pTDE) requires not only evidence to rule out other types of transients, but also proof that only one star is involved, as TDEs from multiple stars can also produce similar flares. In this letter, we report the discovery of a repeated pTDE, AT 2022dbl. In a quiescent galaxy at z=0.0284, two separate optical/UV flares have been observed in 2022 and 2024, with no bright X-ray, radio or mid-infrared counterparts. Compared to the first flare, the second flare has a similar blackbody temperature of ~26,000 K, slightly lower peak luminosity, and slower rise and fall phases. Compared to the ZTF TDEs, their blackbody parameters, bolometric energies and light curve shapes are all similar. The spectra taken during the second flare show a steeper continuum than the late-time spectra of the previous flare, consistent with a newly risen flare. More importantly, the possibility of two independent TDEs can be largely ruled out because the optical spectra taken around the peak of the two flares exhibit highly similar broad Balmer, N III and possible He II emission lines, especially the extreme ~4100Å emission lines. This represents the first robust spectroscopic evidence for a repeated pTDE, which can soon be verified by observing the third flare, given its short orbital period. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 15 pages, 8 figures, submitted to ApJ Letters on 2024 Apr 27

arXiv:2405.10292 [pdf, other]

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Authors: Yuexiang Zhai, Hao Bai, Zipeng Lin, Jiayi Pan, Shengbang Tong, Yifei Zhou, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine

Abstract: Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following data have exhibited impressive language reasoning capabilities across various scenarios. However, this fine-tuning paradigm may not be able to efficiently learn optimal decision-making agents in multi-step goal-directed tasks from interactive environments. To address this challenge, we propose an algorithmic… ▽ More Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following data have exhibited impressive language reasoning capabilities across various scenarios. However, this fine-tuning paradigm may not be able to efficiently learn optimal decision-making agents in multi-step goal-directed tasks from interactive environments. To address this challenge, we propose an algorithmic framework that fine-tunes VLMs with reinforcement learning (RL). Specifically, our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning, enabling the VLM to efficiently explore intermediate reasoning steps that lead to the final text-based action. Next, the open-ended text output is parsed into an executable action to interact with the environment to obtain goal-directed task rewards. Finally, our framework uses these task rewards to fine-tune the entire VLM with RL. Empirically, we demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks, enabling 7b models to outperform commercial models such as GPT4-V or Gemini. Furthermore, we find that CoT reasoning is a crucial component for performance improvement, as removing the CoT reasoning results in a significant decrease in the overall performance of our method. △ Less

Submitted 16 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.10148 [pdf, other]

SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network

Authors: Zhaoxu Li, Wei An, Gaowei Guo, Longguang Wang, Yingqian Wang, Zai** Lin

Abstract: Hyperspectral target detection (HTD) aims to identify specific materials based on spectral information in hyperspectral imagery and can detect point targets, some of which occupy a smaller than one-pixel area. However, existing HTD methods are developed based on per-pixel binary classification, which limits the feature representation capability for point targets. In this paper, we rethink the hype… ▽ More Hyperspectral target detection (HTD) aims to identify specific materials based on spectral information in hyperspectral imagery and can detect point targets, some of which occupy a smaller than one-pixel area. However, existing HTD methods are developed based on per-pixel binary classification, which limits the feature representation capability for point targets. In this paper, we rethink the hyperspectral point target detection from the object detection perspective, and focus more on the object-level prediction capability rather than the pixel classification capability. Inspired by the token-based processing flow of Detection Transformer (DETR), we propose the first specialized network for hyperspectral multi-class point object detection, SpecDETR. Without the backbone part of the current object detection framework, SpecDETR treats the spectral features of each pixel in hyperspectral images as a token and utilizes a multi-layer Transformer encoder with local and global coordination attention modules to extract deep spatial-spectral joint features. SpecDETR regards point object detection as a one-to-many set prediction problem, thereby achieving a concise and efficient DETR decoder that surpasses the current state-of-the-art DETR decoder in terms of parameters and accuracy in point object detection. We develop a simulated hyperSpectral Point Object Detection benchmark termed SPOD, and for the first time, evaluate and compare the performance of current object detection networks and HTD methods on hyperspectral multi-class point object detection. SpecDETR demonstrates superior performance as compared to current object detection networks and HTD methods on the SPOD dataset. Additionally, we validate on a public HTD dataset that by using data simulation instead of manual annotation, SpecDETR can detect real-world single-spectral point objects directly. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.09970 [pdf, ps, other]

On the Cut Elimination of Weak Intuitionistic Tense Logic

Authors: Yiheng Wang, Yu Peng, Zhe Lin

Abstract: In this paper, we use a new method to prove cut-elimination of weak intuitionistic tense logic. This method focuses on splitting the contraction rule and cut rules. Further general theories and applications of this method shall be developed in the future. In this paper, we use a new method to prove cut-elimination of weak intuitionistic tense logic. This method focuses on splitting the contraction rule and cut rules. Further general theories and applications of this method shall be developed in the future. △ Less

Submitted 27 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.09863 [pdf, other]

Box-Free Model Watermarks Are Prone to Black-Box Removal Attacks

Authors: Haonan An, Guang Hua, Zhi** Lin, Yuguang Fang

Abstract: Box-free model watermarking is an emerging technique to safeguard the intellectual property of deep learning models, particularly those for low-level image processing tasks. Existing works have verified and improved its effectiveness in several aspects. However, in this paper, we reveal that box-free model watermarking is prone to removal attacks, even under the real-world threat model such that t… ▽ More Box-free model watermarking is an emerging technique to safeguard the intellectual property of deep learning models, particularly those for low-level image processing tasks. Existing works have verified and improved its effectiveness in several aspects. However, in this paper, we reveal that box-free model watermarking is prone to removal attacks, even under the real-world threat model such that the protected model and the watermark extractor are in black boxes. Under this setting, we carry out three studies. 1) We develop an extractor-gradient-guided (EGG) remover and show its effectiveness when the extractor uses ReLU activation only. 2) More generally, for an unknown extractor, we leverage adversarial attacks and design the EGG remover based on the estimated gradients. 3) Under the most stringent condition that the extractor is inaccessible, we design a transferable remover based on a set of private proxy models. In all cases, the proposed removers can successfully remove embedded watermarks while preserving the quality of the processed images, and we also demonstrate that the EGG remover can even replace the watermarks. Extensive experimental results verify the effectiveness and generalizability of the proposed attacks, revealing the vulnerabilities of the existing box-free methods and calling for further research. △ Less

Submitted 21 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.09465 [pdf, other]

Flashback: Enhancing Proposer-Builder Design with Future-Block Auctions in Proof-of-Stake Ethereum

Authors: Yifan Mao, Mengya Zhang, Shaileshh Bojja Venkatakrishnan, Zhiqiang Lin

Abstract: Maximal extractable value (MEV) in which block proposers unethically gain profits by manipulating the order in which transactions are included within a block, is a key challenge facing blockchains such as Ethereum today. Left unchecked, MEV can lead to a centralization of stake distribution thereby ultimately compromising the security of blockchain consensus. To preserve proposer decentralization… ▽ More Maximal extractable value (MEV) in which block proposers unethically gain profits by manipulating the order in which transactions are included within a block, is a key challenge facing blockchains such as Ethereum today. Left unchecked, MEV can lead to a centralization of stake distribution thereby ultimately compromising the security of blockchain consensus. To preserve proposer decentralization (and hence security) of the blockchain, Ethereum has advocated for a proposer-builder separation (PBS) in which the functionality of transaction ordering is separated from proposers and assigned to separate entities called builders. Builders accept transaction bundles from searchers, who compete to find the most profitable bundles. Builders then bid completed blocks to proposers, who accept the most profitable blocks for publication. The auction mechanisms used between searchers, builders and proposers are crucial to the overall health of the blockchain. In this paper, we consider PBS design in Ethereum as a game between searchers, builders and proposers. A key novelty in our design is the inclusion of future block proposers, as all proposers of an epoch are decided ahead of time in proof-of-stake (PoS) Ethereum within the game model. Our analysis shows the existence of alternative auction mechanisms that result in a better (more profitable) equilibrium to players compared to state-of-the-art. Experimental evaluations based on synthetic and real-world data traces corroborate the analysis. Our results highlight that a rethinking of auction mechanism designs is necessary in PoS Ethereum to prevent disruption. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.06174 [pdf, other]

Observation of a $p$-orbital higher-order topological insulator phase in puckered lattice acoustic metamaterials

Authors: Bing-Quan Wu, Zhi-Kang Lin, Li-Wei Wang, Jian-Hua Jiang

Abstract: The puckered lattice geometry, along with $p$-orbitals is often overlooked in the study of topological physics. Here, we investigate the higher-order topology of the $p_{x,y}$-orbital bands in acoustic metamaterials using a simplified two-dimensional phosphorene lattice which possesses a puckered structure. Notably, unlike the $s$-orbital bands in planar lattices, the unique higher-order topology… ▽ More The puckered lattice geometry, along with $p$-orbitals is often overlooked in the study of topological physics. Here, we investigate the higher-order topology of the $p_{x,y}$-orbital bands in acoustic metamaterials using a simplified two-dimensional phosphorene lattice which possesses a puckered structure. Notably, unlike the $s$-orbital bands in planar lattices, the unique higher-order topology observed here is specific to $p$-orbitals and the puckered geometry due to the unusual hop** patterns induced by them. {Using acoustic pump-probe measurements in metamaterials}, we confirm the emergence of the edge and corner states arising due to the unconventional higher-order topology. We reveal the uniqueness of the higher-order topological physics here via complimentary tight-binding calculations, finite-element simulations, and acoustic experiments. We analyze the underlying physics of the special properties of the edge and corner states in the puckered lattice acoustic metamaterials from the picture of Wannier orbitals. Our work sheds light on the intriguing physics of $p$-orbital topological physics in puckered lattices and acoustic metamaterials which lead to unconventional topological boundary states. \end{abstract} △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: Accepted by Phys. Rev. B

arXiv:2405.06170 [pdf]

doi 10.1103/PhysRevB.108.195126

Non-Hermitian topological phases and skin effects in kagome lattices

Authors: Li-Wei Wang, Zhi-Kang Lin, Jian-Hua Jiang

Abstract: Non-Hermitian physics has added new ingredients to topological physics, leading to the rising frontier of non-Hermitian topological phases. In this study, we investigate Chern insulator phases emerging from non-Hermitian kagome models with non-reciprocal and pure imaginary next-nearest neighbor hop**s. In the presence or absence of $C_3$ rotation symmetry, hybrid topological-skin effects are exp… ▽ More Non-Hermitian physics has added new ingredients to topological physics, leading to the rising frontier of non-Hermitian topological phases. In this study, we investigate Chern insulator phases emerging from non-Hermitian kagome models with non-reciprocal and pure imaginary next-nearest neighbor hop**s. In the presence or absence of $C_3$ rotation symmetry, hybrid topological-skin effects are explored through the identification of distinct corner skin modes in different energy regions within two band gaps. By employing the dynamical analysis, the underlying physics is revealed from the non-Hermitian skin effects associated with the chiral edge states, leading to diverse non-Hermitian bulk-boundary responses. The simplicity of these kagome models and their rich emergent topological phenomena suggest that they are appealing candidates for studying non-Hermitian topological phases. We further discuss the possible realizations of these models in non-Hermitian metamaterials. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Journal ref: Phys. Rev. B 108, 195126 (2023)

arXiv:2405.05975 [pdf, other]

Deep-learning design of graphene metasurfaces for quantum control and Dirac electron holography

Authors: Chen-Di Han, Li-Li Ye, Zin Lin, Vassilios Kovanis, Ying-Cheng Lai

Abstract: Metasurfaces are sub-wavelength patterned layers for controlling waves in physical systems. In optics, meta-surfaces are created by materials with different dielectric constants and are capable of unconventional functionalities. We develop a deep-learning framework for Dirac-material metasurface design for controlling electronic waves. The metasurface is a configuration of circular graphene quantu… ▽ More Metasurfaces are sub-wavelength patterned layers for controlling waves in physical systems. In optics, meta-surfaces are created by materials with different dielectric constants and are capable of unconventional functionalities. We develop a deep-learning framework for Dirac-material metasurface design for controlling electronic waves. The metasurface is a configuration of circular graphene quantum dots, each created by an electric potential. Employing deep convolutional neural networks, we show that the original scattering wave can be reconstructed with fidelity over 95$\%$, suggesting the feasibility of Dirac electron holography. Additional applications such as plane wave generation, designing broadband, and multi-functionality graphene metasurface systems are illustrated. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 13 pages, 9 figures

arXiv:2405.05803 [pdf, other]

Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

Authors: Zhihang Lin, Mingbao Lin, Luxi Lin, Rongrong Ji

Abstract: Multimodal large language models (MLLMs) demand considerable computations for inference due to the extensive parameters and the additional input tokens needed for visual information representation. Herein, we introduce Visual Tokens Withdrawal (VTW), a plug-and-play module to boost MLLMs for rapid inference. Our approach is inspired by two intriguing phenomena we have observed: (1) the attention s… ▽ More Multimodal large language models (MLLMs) demand considerable computations for inference due to the extensive parameters and the additional input tokens needed for visual information representation. Herein, we introduce Visual Tokens Withdrawal (VTW), a plug-and-play module to boost MLLMs for rapid inference. Our approach is inspired by two intriguing phenomena we have observed: (1) the attention sink phenomenon that is prevalent in LLMs also persists in MLLMs, suggesting that initial tokens and nearest tokens receive the majority of attention, while middle vision tokens garner minimal attention in deep layers; (2) the presence of information migration, which implies that visual information is transferred to subsequent text tokens within the first few layers of MLLMs. As per our findings, we conclude that vision tokens are not necessary in the deep layers of MLLMs. Thus, we strategically withdraw them at a certain layer, enabling only text tokens to engage in subsequent layers. To pinpoint the ideal layer for vision tokens withdrawal, we initially analyze a limited set of tiny datasets and choose the first layer that meets the Kullback-Leibler divergence criterion. Our VTW approach can cut computational overhead by over 40\% across diverse multimodal tasks while maintaining performance. Our code is released at https://github.com/lzhxmu/VTW. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05252 [pdf, other]

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Authors: Hongjie Wang, Difan Liu, Yan Kang, Yijun Li, Zhe Lin, Niraj K. Jha, Yuchen Liu

Abstract: Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable… ▽ More Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable. To this end, we introduce the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework that leverages attention maps to perform run-time pruning of redundant tokens, without the need for any retraining. Specifically, for single-denoising-step pruning, we develop a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens, and a similarity-based recovery method to restore tokens for the convolution operation. In addition, we propose a Denoising-Steps-Aware Pruning (DSAP) approach to adjust the pruning budget across different denoising timesteps for better generation quality. Extensive evaluations show that AT-EDM performs favorably against prior art in terms of efficiency (e.g., 38.8% FLOPs saving and up to 1.53x speed-up over Stable Diffusion XL) while maintaining nearly the same FID and CLIP scores as the full model. Project webpage: https://atedm.github.io. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2405.04342 [pdf, other]

The Curse of Diversity in Ensemble-Based Exploration

Authors: Zhixuan Lin, Pierluca D'Oro, Evgenii Nikishin, Aaron Courville

Abstract: We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated d… ▽ More We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated data in the shared training data for each ensemble member, as well as the inefficiency of the individual ensemble members to learn from such highly off-policy data. We thus name this phenomenon the curse of diversity. We find that several intuitive solutions -- such as a larger replay buffer or a smaller ensemble size -- either fail to consistently mitigate the performance loss or undermine the advantages of ensembling. Finally, we demonstrate the potential of representation learning to counteract the curse of diversity with a novel method named Cross-Ensemble Representation Learning (CERL) in both discrete and continuous control domains. Our work offers valuable insights into an unexpected pitfall in ensemble-based exploration and raises important caveats for future applications of similar approaches. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Published as a conference paper at ICLR 2024

arXiv:2405.04332 [pdf, other]

WALLETRADAR: Towards Automating the Detection of Vulnerabilities in Browser-based Cryptocurrency Wallets

Authors: Pengcheng Xia, Yanhui Guo, Zhaowen Lin, Jun Wu, Pengbo Duan, Ningyu He, Kailong Wang, Tianming Liu, Yinliang Yue, Guoai Xu, Haoyu Wang

Abstract: Cryptocurrency wallets, acting as fundamental infrastructure to the blockchain ecosystem, have seen significant user growth, particularly among browser-based wallets (i.e., browser extensions). However, this expansion accompanies security challenges, making these wallets prime targets for malicious activities. Despite a substantial user base, there is not only a significant gap in comprehensive se… ▽ More Cryptocurrency wallets, acting as fundamental infrastructure to the blockchain ecosystem, have seen significant user growth, particularly among browser-based wallets (i.e., browser extensions). However, this expansion accompanies security challenges, making these wallets prime targets for malicious activities. Despite a substantial user base, there is not only a significant gap in comprehensive security analysis but also a pressing need for specialized tools that can aid developers in reducing vulnerabilities during the development process. To fill the void, we present a comprehensive security analysis of browser-based wallets in this paper, along with the development of an automated tool designed for this purpose. We first compile a taxonomy of security vulnerabilities resident in cryptocurrency wallets by harvesting historical security reports. Based on this, we design WALLETRADAR, an automated detection framework that can accurately identify security issues based on static and dynamic analysis. Evaluation of 96 popular browser-based wallets shows WALLETRADAR's effectiveness, by successfully automating the detection process in 90% of these wallets with high precision. This evaluation has led to the discovery of 116 security vulnerabilities corresponding to 70 wallets. By the time of this paper, we have received confirmations of 10 vulnerabilities from 8 wallet developers, with over $2,000 bug bounties. Further, we observed that 12 wallet developers have silently fixed 16 vulnerabilities after our disclosure. WALLETRADAR can effectively automate the identification of security risks in cryptocurrency wallets, thereby enhancing software development quality and safety in the blockchain ecosystem. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Just accepted by the Automated Software Engineering Journal

arXiv:2405.04269

An Analysis of Sea Level Spatial Variability by Topological Indicators and $k$-means Clustering Algorithm

Authors: Zixin Lin, Nur Fariha Syaqina Zulkepli, Mohd Shareduwan Mohd Kasihmuddin, R. U. Gobithaasan

Abstract: The time-series data of sea level rise and fall contains crucial information on the variability of sea level patterns. Traditional $k$-means clustering is commonly used for categorizing regional variability of sea level, however, its results are not robust against a number of factors. This study analyzed fourteen datasets of monthly sea level in fourteen shoreline regions of Peninsular Malaysia. W… ▽ More The time-series data of sea level rise and fall contains crucial information on the variability of sea level patterns. Traditional $k$-means clustering is commonly used for categorizing regional variability of sea level, however, its results are not robust against a number of factors. This study analyzed fourteen datasets of monthly sea level in fourteen shoreline regions of Peninsular Malaysia. We applied a hybridization of clustering technique to analyze data categorization and topological data analysis method to enhance the performance of our clustering analysis. Specifically, our approach utilized the persistent homology and $k$-means/$k$-means++ clustering. The fourteen data sets from fourteen tide gauge stations were categorized in classes based on a prior categorization that was determined by topological information, and the probability of data points that belong to certain groups that is yielded by $k$-means/$k$-means++ clustering. Our results demonstrated that our method significantly improves the performance of traditional clustering techniques. △ Less

Submitted 13 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: There are some mistakes in the submission, and it needs major revision

arXiv:2405.04086 [pdf, other]

Optimizing Language Model's Reasoning Abilities with Weak Supervision

Authors: Yongqi Tong, Sizhe Wang, Dawei Li, Yifan Wang, Simeng Han, Zi Lin, Chengsong Huang, Jiaxin Huang, **gbo Shang

Abstract: While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities w… ▽ More While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities with minimal human supervision. In this work, we introduce self-reinforcement, which begins with Supervised Fine-Tuning (SFT) of the model using a small collection of annotated questions. Then it iteratively improves LLMs by learning from the differences in responses from the SFT and unfinetuned models on unlabeled questions. Our approach provides an efficient approach without relying heavily on extensive human-annotated explanations. However, current reasoning benchmarks typically only include golden-reference answers or rationales. Therefore, we present \textsc{PuzzleBen}, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales across various domains, such as brainteasers, puzzles, riddles, parajumbles, and critical reasoning tasks. A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities. Our experiments underscore the significance of \textsc{PuzzleBen}, as well as the effectiveness of our methodology as a promising direction in future endeavors. Our dataset and code will be published soon on \texttt{Anonymity Link}. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03990 [pdf, other]

TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

Authors: Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang

Abstract: Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observat… ▽ More Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-ε\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by develo** a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models. △ Less

Submitted 19 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: 11 pages, 7 figures. This paper has been accepted by ICDCS 2024. The extended version of this paper is at arXiv:2404.14204

arXiv:2405.03613 [pdf, other]

Dual Relation Mining Network for Zero-Shot Learning

Authors: **wei Han, Yingguo Gao, Zhiwen Lin, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia

Abstract: Zero-shot learning (ZSL) aims to recognize novel classes through transferring shared semantic knowledge (e.g., attributes) from seen classes to unseen classes. Recently, attention-based methods have exhibited significant progress which align visual features and attributes via a spatial attention mechanism. However, these methods only explore visual-semantic relationship in the spatial dimension, w… ▽ More Zero-shot learning (ZSL) aims to recognize novel classes through transferring shared semantic knowledge (e.g., attributes) from seen classes to unseen classes. Recently, attention-based methods have exhibited significant progress which align visual features and attributes via a spatial attention mechanism. However, these methods only explore visual-semantic relationship in the spatial dimension, which can lead to classification ambiguity when different attributes share similar attention regions, and semantic relationship between attributes is rarely discussed. To alleviate the above problems, we propose a Dual Relation Mining Network (DRMN) to enable more effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer. Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion and conducts spatial attention for visual to semantic embedding. Moreover, an attribute-guided channel attention is utilized to decouple entangled semantic features. For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images. Additionally, a global classification branch is introduced as a complement to human-defined semantic attributes, and we then combine the results with attribute-based classification. Extensive experiments demonstrate that the proposed DRMN leads to new state-of-the-art performances on three standard ZSL benchmarks, i.e., CUB, SUN, and AwA2. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.02057 [pdf]

Probing fragile topology with a screw dislocation

Authors: Ying Wu, Zhi-Kang Lin, Yating Yang, Zhida Song, Feng Li, Jian-Hua Jiang

Abstract: Fragile topology, akin to twisted bilayer graphene and the exotic phases therein, is a notable topological class with intriguing properties. However, due to its unique nature and the lack of bulk-edge correspondence, the experimental signature of fragile topology has been under debated since its birth. Here, we demonstrate experimentally that fragile topological phases with filling anomaly can be… ▽ More Fragile topology, akin to twisted bilayer graphene and the exotic phases therein, is a notable topological class with intriguing properties. However, due to its unique nature and the lack of bulk-edge correspondence, the experimental signature of fragile topology has been under debated since its birth. Here, we demonstrate experimentally that fragile topological phases with filling anomaly can be probed via screw dislocations, despite that they do not support gapless edge states. Using a designer hexagonal phononic crystal with a fragile topological band gap, we find that 1D gapless bound modes can emerge at a screw dislocation due to the bulk fragile topology. We then establish a connection between our system and the twisted boundary condition via the gauge invariance principle and illustrate that such an emergent phenomenon is an intrinsic property of fragile topological phases with filling anomaly. We observe experimentally the 1D topological bound states using the pump-probe measurements of their dispersion and wavefunctions, which unveils a novel bulk-defect correspondence of fragile topology and a powerful tool for probing fragile topological phases and materials. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Submitted to Science Bulletin

arXiv:2405.01851 [pdf, other]

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

Authors: Sicong Liu, Wentao Zhou, Zimu Zhou, Bin Guo, Minfan Wang, Cheng Fang, Zheng Lin, Zhiwen Yu

Abstract: There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been e… ▽ More There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored to optimize computation distribution, achieve load balance, and minimize communication cost across processors. Yet their practical effectiveness in the dynamic and diverse real-world mobile environment is less explored. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors. Through carefully designed experiments covering various DL models, mobile software/hardware environments, workload patterns, and resource availability, we identify limitations of existing techniques and highlight opportunities for cross-level optimization. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.00954 [pdf, other]

X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

Authors: Yiwei Ma, Zhekai Lin, Jiayi Ji, Yijun Fan, Xiaoshuai Sun, Rongrong Ji

Abstract: Recent advancements in automatic 3D avatar generation guided by text have made significant progress. However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progressive framework for generating high-quality animatable avatars from text prompts. It follows a sequential Geometry->Texture->Animation paradigm, simplif… ▽ More Recent advancements in automatic 3D avatar generation guided by text have made significant progress. However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progressive framework for generating high-quality animatable avatars from text prompts. It follows a sequential Geometry->Texture->Animation paradigm, simplifying optimization through step-by-step generation. To tackle oversaturation, we introduce Adaptive Variational Parameter (AVP), representing avatars as an adaptive distribution during training. Additionally, we present Avatar-aware Score Distillation Sampling (ASDS), a novel technique that incorporates avatar-aware noise into rendered images for improved generation quality during optimization. Extensive evaluations confirm the superiority of X-Oscar over existing text-to-3D and text-to-avatar approaches. Our anonymous project page: https://xmu-xiaoma666.github.io/Projects/X-Oscar/. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: ICML2024

arXiv:2405.00700 [pdf]

Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

Authors: Liang Li, Ting Zhou, Tong Liu, Zhiwei Liu, Ya** Li, Shuo Wu, Shanguang Zhao, **glin Zhu, Meiling Liu, Zhihan Lin, Bowen Sun, Jianjun Li, Fangwen Sun, Chongwen Zou

Abstract: Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from pr… ▽ More Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from practical application. Due to the special insulator-metal phase transition, Vanadium Dioxide (VO2) has been considered as an idea candidate for neuronal device fabrication. However, its intrinsic insulating state requires the VO2 neuronal device to be driven under large bias voltage, resulting in high power consumption and low frequency. Thus in the current study, we have addressed this challenge by preparing oxygen vacancies modulated VO2 film(VO2-x) and fabricating the VO2-x neuronal devices for Spiking Neural Networks (SNNs) construction. Results indicate the neuron devices can be operated under lower voltage with improved processing speed. The proposed VO2-x based back-propagation SNNs (BP-SNNs) system, trained with the MNIST dataset, demonstrates excellent accuracy in image recognition. Our study not only demonstrates the VO2-x based neurons and SNN system for practical application, but also offers an effective way to optimize the future neuromorphic computing systems by defect engineering strategy. △ Less

Submitted 16 April, 2024; originally announced May 2024.

Comments: 18 pages,4 figures

arXiv:2404.19209 [pdf, other]

AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices

Authors: Zheng Lin, Bin Guo, Sicong Liu, Wentao Zhou, Yasan Ding, Yu Zhang, Zhiwen Yu

Abstract: Deep neural network (DNN) has driven extensive applications in mobile technology. However, for long-running mobile apps like voice assistants or video applications on smartphones, energy efficiency is critical for battery-powered devices. The rise of heterogeneous processors in mobile devices today has introduced new challenges for optimizing energy efficiency. Our key insight is that partitioning… ▽ More Deep neural network (DNN) has driven extensive applications in mobile technology. However, for long-running mobile apps like voice assistants or video applications on smartphones, energy efficiency is critical for battery-powered devices. The rise of heterogeneous processors in mobile devices today has introduced new challenges for optimizing energy efficiency. Our key insight is that partitioning computations across different processors for parallelism and speedup doesn't necessarily correlate with energy consumption optimization and may even increase it. To address this, we present AdaOper, an energy-efficient concurrent DNN inference system. It optimizes energy efficiency on mobile heterogeneous processors while maintaining responsiveness. AdaOper includes a runtime energy profiler that dynamically adjusts operator partitioning to optimize energy efficiency based on dynamic device conditions. We conduct preliminary experiments, which show that AdaOper reduces energy consumption by 16.88% compared to the existing concurrent method while ensuring real-time performance. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18829 [pdf, other]

Disentangling the development of collective flow in high energy proton proton collisions with a multiphase transport model

Authors: Liang Zheng, Lian Liu, Zi-Wei Lin, Qi-Ye Shou, Zhong-Bao Yin

Abstract: In this work, we investigate the collective flow development in high energy proton proton (pp) collisions with a multiphase transport model (AMPT) based on PYTHIA8 initial conditions with a sub-nucleon structure. It is found that the PYTHIA8 based AMPT model can reasonably describe both the charged hadron productions and elliptic flow experimental data measured in pp collisions at $\sqrt{s}=13$ Te… ▽ More In this work, we investigate the collective flow development in high energy proton proton (pp) collisions with a multiphase transport model (AMPT) based on PYTHIA8 initial conditions with a sub-nucleon structure. It is found that the PYTHIA8 based AMPT model can reasonably describe both the charged hadron productions and elliptic flow experimental data measured in pp collisions at $\sqrt{s}=13$ TeV. By turning on the parton and hadron rescatterings in AMPT separately, we find that the observed collective flow in pp collisions is largely developed during the parton evolutions, while no significant flow effect can be generated with the pure hadronic rescatterings. It is also shown that the parton escape mechanism is important for describing both the magnitude of the two-particle cumulant and the sign of the four-particle cumulants. We emphasize that the strong mass ordering of the elliptic flow results from the coalescence process in the transport model can thus be regarded as unique evidence related to the creation of deconfined parton matter in high energy pp collisions. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18533 [pdf, other]

Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability

Authors: Meng Li, Haoran **, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang

Abstract: Despite the surprisingly high intelligence exhibited by Large Language Models (LLMs), we are somehow intimidated to fully deploy them into real-life applications considering their black-box nature. Concept-based explanations arise as a promising avenue for explaining what the LLMs have learned, making them more transparent to humans. However, current evaluations for concepts tend to be heuristic a… ▽ More Despite the surprisingly high intelligence exhibited by Large Language Models (LLMs), we are somehow intimidated to fully deploy them into real-life applications considering their black-box nature. Concept-based explanations arise as a promising avenue for explaining what the LLMs have learned, making them more transparent to humans. However, current evaluations for concepts tend to be heuristic and non-deterministic, e.g. case study or human evaluation, hindering the development of the field. To bridge the gap, we approach concept-based explanation evaluation via faithfulness and readability. We first introduce a formal definition of concept generalizable to diverse concept-based explanations. Based on this, we quantify faithfulness via the difference in the output upon perturbation. We then provide an automatic measure for readability, by measuring the coherence of patterns that maximally activate a concept. This measure serves as a cost-effective and reliable substitute for human evaluation. Finally, based on measurement theory, we describe a meta-evaluation method for evaluating the above measures via reliability and validity, which can be generalized to other tasks as well. Extensive experimental analysis has been conducted to validate and inform the selection of concept evaluation measures. △ Less

Submitted 29 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18173 [pdf, other]

Eigenvector overlaps in large sample covariance matrices and nonlinear shrinkage estimators

Authors: Zeqin Lin, Guangming Pan

Abstract: Consider a data matrix $Y = [\mathbf{y}_1, \cdots, \mathbf{y}_N]$ of size $M \times N$, where the columns are independent observations from a random vector $\mathbf{y}$ with zero mean and population covariance $Σ$. Let $\mathbf{u}_i$ and $\mathbf{v}_j$ denote the left and right singular vectors of $Y$, respectively. This study investigates the eigenvector/singular vector overlaps… ▽ More Consider a data matrix $Y = [\mathbf{y}_1, \cdots, \mathbf{y}_N]$ of size $M \times N$, where the columns are independent observations from a random vector $\mathbf{y}$ with zero mean and population covariance $Σ$. Let $\mathbf{u}_i$ and $\mathbf{v}_j$ denote the left and right singular vectors of $Y$, respectively. This study investigates the eigenvector/singular vector overlaps $\langle {\mathbf{u}_i, D_1 \mathbf{u}_j} \rangle$, $\langle {\mathbf{v}_i, D_2 \mathbf{v}_j} \rangle$ and $\langle {\mathbf{u}_i, D_3 \mathbf{v}_j} \rangle$, where $D_k$ are general deterministic matrices with bounded operator norms. We establish the convergence in probability of these eigenvector overlaps toward their deterministic counterparts with explicit convergence rates, when the dimension $M$ scales proportionally with the sample size $N$. Building on these findings, we offer a more precise characterization of the loss for Ledoit and Wolf's nonlinear shrinkage estimators of the population covariance $Σ$. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.17808 [pdf, other]

Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal

Authors: Haoran Lian, Yizhe Xiong, Jianwei Niu, Shasha Mo, Zhenpeng Su, Zijia Lin, Peng Liu, Hui Chen, Guiguang Ding

Abstract: Byte Pair Encoding (BPE) serves as a foundation method for text tokenization in the Natural Language Processing (NLP) field. Despite its wide adoption, the original BPE algorithm harbors an inherent flaw: it inadvertently introduces a frequency imbalance for tokens in the text corpus. Since BPE iteratively merges the most frequent token pair in the text corpus while kee** all tokens that have be… ▽ More Byte Pair Encoding (BPE) serves as a foundation method for text tokenization in the Natural Language Processing (NLP) field. Despite its wide adoption, the original BPE algorithm harbors an inherent flaw: it inadvertently introduces a frequency imbalance for tokens in the text corpus. Since BPE iteratively merges the most frequent token pair in the text corpus while kee** all tokens that have been merged in the vocabulary, it unavoidably holds tokens that primarily represent subwords of complete words and appear infrequently on their own in the text corpus. We term such tokens as Scaffold Tokens. Due to their infrequent appearance in the text corpus, Scaffold Tokens pose a learning imbalance issue for language models. To address that issue, we propose Scaffold-BPE, which incorporates a dynamic scaffold token removal mechanism by parameter-free, computation-light, and easy-to-implement modifications to the original BPE. This novel approach ensures the exclusion of low-frequency Scaffold Tokens from the token representations for the given texts, thereby mitigating the issue of frequency imbalance and facilitating model training. On extensive experiments across language modeling tasks and machine translation tasks, Scaffold-BPE consistently outperforms the original BPE, well demonstrating its effectiveness and superiority. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Showing 51–100 of 2,219 results for author: Lin, Z