Skip to main content

Showing 1–50 of 67 results for author: Dong, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16554  [pdf, other

    cs.CL

    LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

    Authors: Tong Zhu, Xiaoye Qu, Daize Dong, Jiacheng Ruan, **gqi Tong, Conghui He, Yu Cheng

    Abstract: Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for scaling up large language models (LLMs). However, training MoE from scratch in a large-scale setting still suffers from data-hungry and instability problems. Motivated by this limit, we investigate building MoE models from existing dense large language models. Specifically, based on the well-known LLaMA-2 7B mod… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.11256  [pdf, other

    cs.CL

    Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

    Authors: Tong Zhu, Daize Dong, Xiaoye Qu, Jiacheng Ruan, Wenliang Chen, Yu Cheng

    Abstract: Mixture-of-Experts (MoE) models have shown remarkable capability in instruction tuning, especially when the number of tasks scales. However, previous methods simply merge all training tasks (e.g. creative writing, coding, and mathematics) and apply fixed sampling weights, without considering the importance of different tasks as the model training state changes. In this way, the most helpful data c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.02500  [pdf, other

    cs.LG cs.AI

    Demystifying the Compression of Mixture-of-Experts Through a Unified Framework

    Authors: Shwai He, Daize Dong, Liang Ding, Ang Li

    Abstract: Scaling large language models has revolutionized the performance across diverse domains, yet the continual growth in model size poses significant challenges for real-world deployment. The Mixture of Experts (MoE) approach addresses this by dynamically selecting and activating only a subset of experts, significantly reducing computational costs while maintaining high performance. However, MoE intro… ▽ More

    Submitted 24 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 20 pages, 15 figures, 5 tables

  4. arXiv:2405.17870  [pdf, other

    cs.DC

    Full-Stack Allreduce on Multi-Rail Networks

    Authors: Enda Yu, Dezun Dong, Xiangke Liao

    Abstract: The high communication costs impede scalability in distributed systems. Multimodal models like Sora exacerbate this issue by requiring more resources than current networks can support. However, existing network architectures fail to address this gap. In this paper, we provide full-stack support for allreduce on multi-rail networks, aiming to overcome the scalability limitations of large-scale netw… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Submitted to SC'2024

  5. arXiv:2405.06948  [pdf, other

    cs.CV

    Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation

    Authors: Shengyuan Liu, Bo Wang, Ye Ma, Te Yang, Xipeng Cao, Quan Chen, Han Li, Di Dong, Peng Jiang

    Abstract: Existing subject-driven text-to-image generation models suffer from tedious fine-tuning steps and struggle to maintain both text-image alignment and subject fidelity. For generating compositional subjects, it often encounters problems such as object missing and attribute mixing, where some subjects in the input prompt are not generated or their attributes are incorrectly combined. To address these… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 26 pages, 13 figures

  6. arXiv:2404.13391  [pdf, other

    eess.SY cs.LG math.OC

    Online Planning of Power Flows for Power Systems Against Bushfires Using Spatial Context

    Authors: Jianyu Xu, Qiuzhuang Sun, Yang Yang, Huadong Mo, Daoyi Dong

    Abstract: The 2019-20 Australia bushfire incurred numerous economic losses and significantly affected the operations of power systems. A power station or transmission line can be significantly affected due to bushfires, leading to an increase in operational costs. We study a fundamental but challenging problem of planning the optimal power flow (OPF) for power systems subject to bushfires. Considering the s… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  7. arXiv:2403.15750  [pdf, other

    cs.CV

    iDAT: inverse Distillation Adapter-Tuning

    Authors: Jiacheng Ruan, **gsheng Gao, Mingye Xie, Daize Dong, Suncheng Xiang, Ting Liu, Yuzhuo Fu

    Abstract: Adapter-Tuning (AT) method involves freezing a pre-trained model and introducing trainable adapter modules to acquire downstream knowledge, thereby calibrating the model for better adaptation to downstream tasks. This paper proposes a distillation framework for the AT method instead of crafting a carefully designed adapter module, which aims to improve fine-tuning performance. For the first time,… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 10 pages, 9 figures, 13 tables. This paper has been accepted by ICME 2024

  8. arXiv:2403.09195  [pdf, other

    cs.CV

    SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times Acceleration

    Authors: Yanfei Song, Bangzheng Pu, Peng Wang, Hongxu Jiang, Dong Dong, Yongxiang Cao, Yiqing Shen

    Abstract: Segment Anything Model (SAM) has garnered significant attention in segmentation tasks due to their zero-shot generalization ability. However, a broader application of SAMs to real-world practice has been restricted by their low inference speed and high computational memory demands, which mainly stem from the attention mechanism. Existing work concentrated on optimizing the encoder, yet has not ade… ▽ More

    Submitted 17 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  9. arXiv:2402.02464  [pdf, other

    cs.LG cs.AI cs.SI

    A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer

    Authors: Zhangyang Gao, Daize Dong, Cheng Tan, Jun Xia, Bozhen Hu, Stan Z. Li

    Abstract: Can we model Non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The Non-Euclidean property have posed a long term challenge in graph modeling. Despite recent graph neural networks and graph transformers efforts encoding graphs as Euclidean vectors, recovering the original graph from vectors remains a challenge. In this paper, we introduce Gr… ▽ More

    Submitted 29 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  10. arXiv:2401.11724  [pdf, other

    cs.CV cs.AI

    Augmenting Prototype Network with TransMix for Few-shot Hyperspectral Image Classification

    Authors: Chun Liu, Longwei Yang, Dongmei Dong, Zheng Li, Wei Yang, Zhigang Han, Jiayao Wang

    Abstract: Few-shot hyperspectral image classification aims to identify the classes of each pixel in the images by only marking few of these pixels. And in order to obtain the spatial-spectral joint features of each pixel, the fixed-size patches centering around each pixel are often used for classification. However, observing the classification results of existing methods, we found that boundary patches corr… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  11. arXiv:2401.02708  [pdf, other

    cs.LG cs.AI stat.ML

    TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival Analysis

    Authors: Liwen Zhang, Lianzhen Zhong, Fan Yang, Di Dong, Hui Hui, Jie Tian

    Abstract: A core challenge in survival analysis is to model the distribution of censored time-to-event data, where the event of interest may be a death, failure, or occurrence of a specific event. Previous studies have showed that ranking and maximum likelihood estimation (MLE)loss functions are widely-used for survival analysis. However, ranking loss only focus on the ranking of survival time and does not… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 9 pages,6 figures

  12. arXiv:2401.01571  [pdf, other

    cs.SE cs.PL

    CodeFuse-Query: A Data-Centric Static Code Analysis System for Large-Scale Organizations

    Authors: Xiaoheng Xie, Gang Fan, Xiaojun Lin, Ang Zhou, Shijie Li, Xun** Zheng, Yinan Liang, Yu Zhang, Na Yu, Haokun Li, Xinyu Chen, Yingzhuang Chen, Yi Zhen, Dejun Dong, Xian** Fu, **zhou Su, Fuxiong Pan, Pengshuai Luo, Youzheng Feng, Ruoxiang Hu, **g Fan, **guo Zhou, Xiao Xiao, Peng Di

    Abstract: In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static code analysis through the fusion of Domain Optimized System Design and Logic Oriented Computation Design. CodeFuse-Query reimagines code analysis as a data compu… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  13. arXiv:2311.07766  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Vision-Language Integration in Multimodal Video Transformers (Partially) Aligns with the Brain

    Authors: Dota Tianai Dong, Mariya Toneva

    Abstract: Integrating information from multiple modalities is arguably one of the essential prerequisites for grounding artificial intelligence systems with an understanding of the real world. Recent advances in video transformers that jointly learn from vision, text, and sound over time have made some progress toward this goal, but the degree to which these models integrate information from modalities stil… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  14. arXiv:2310.15204  [pdf

    cs.LG

    Mid-Long Term Daily Electricity Consumption Forecasting Based on Piecewise Linear Regression and Dilated Causal CNN

    Authors: Zhou Lan, Ben Liu, Yi Feng, Danhuang Dong, Peng Zhang

    Abstract: Daily electricity consumption forecasting is a classical problem. Existing forecasting algorithms tend to have decreased accuracy on special dates like holidays. This study decomposes the daily electricity consumption series into three components: trend, seasonal, and residual, and constructs a two-stage prediction method using piecewise linear regression as a filter and Dilated Causal CNN as a pr… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Key words: Daily electricity consumption forecasting; time series decomposition; piecewise linear regression; Dilated Causal CNN

  15. arXiv:2310.00518  [pdf, other

    quant-ph cs.AI

    Learning Informative Latent Representation for Quantum State Tomography

    Authors: Hailan Ma, Zhenhong Sun, Daoyi Dong, Dong Gong

    Abstract: Quantum state tomography (QST) is the process of reconstructing the complete state of a quantum system (mathematically described as a density matrix) through a series of different measurements. These measurements are performed on a number of identical copies of the quantum system, with outcomes gathered as frequencies. QST aims to recover the density matrix and the corresponding properties of the… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

  16. arXiv:2308.07622  [pdf, other

    cs.MM

    EMID: An Emotional Aligned Dataset in Audio-Visual Modality

    Authors: Jialing Zou, Jiahao Mei, Guangze Ye, Tianyu Huai, Qiwei Shen, Daoguo Dong

    Abstract: In this paper, we propose Emotionally paired Music and Image Dataset (EMID), a novel dataset designed for the emotional matching of music and images, to facilitate auditory-visual cross-modal tasks such as generation and retrieval. Unlike existing approaches that primarily focus on semantic correlations or roughly divided emotional relations, EMID emphasizes the significance of emotional consisten… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  17. arXiv:2306.03387  [pdf, other

    cs.AI

    ColdNAS: Search to Modulate for User Cold-Start Recommendation

    Authors: Shiguang Wu, Yaqing Wang, Qinghe **g, Daxiang Dong, De**g Dou, Quanming Yao

    Abstract: Making personalized recommendation for cold-start users, who only have a few interaction histories, is a challenging problem in recommendation systems. Recent works leverage hypernetworks to directly map user interaction histories to user-specific parameters, which are then used to modulate predictor by feature-wise linear modulation function. These works obtain the state-of-the-art performance. H… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

  18. arXiv:2305.13850  [pdf, other

    cs.CL cs.AI

    Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document

    Authors: Xiangnan Chen, Qian Xiao, Juncheng Li, Duo Dong, Jun Lin, Xiaozhong Liu, Siliang Tang

    Abstract: Visual Relation Extraction (VRE) is a powerful means of discovering relationships between entities within visually-rich documents. Existing methods often focus on manipulating entity features to find pairwise relations, yet neglect the more fundamental structural information that links disparate entity pairs together. The absence of global structure information may make the model struggle to learn… ▽ More

    Submitted 27 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted by EMNLP 2023 (Findings)

  19. arXiv:2305.11643  [pdf, other

    cs.RO

    Time Optimal Ergodic Search

    Authors: Dayi Dong, Henry Berger, Ian Abraham

    Abstract: Robots with the ability to balance time against the thoroughness of search have the potential to provide time-critical assistance in applications such as search and rescue. Current advances in ergodic coverage-based search methods have enabled robots to completely explore and search an area in a fixed amount of time. However, optimizing time against the quality of autonomous ergodic search has yet… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: 13 pages, 8 figures, Robotics: Science and Systems

  20. arXiv:2305.05433  [pdf, other

    quant-ph cs.LG eess.SY

    Tomography of Quantum States from Structured Measurements via quantum-aware transformer

    Authors: Hailan Ma, Zhenhong Sun, Daoyi Dong, Chunlin Chen, Herschel Rabitz

    Abstract: Quantum state tomography (QST) is the process of reconstructing the state of a quantum system (mathematically described as a density matrix) through a series of different measurements, which can be solved by learning a parameterized function to translate experimentally measured statistics into physical density matrices. However, the specific structure of quantum measurements for characterizing a q… ▽ More

    Submitted 17 November, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  21. arXiv:2304.11384  [pdf, other

    cs.SE

    Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning

    Authors: Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi **, Xiaoguang Mao, Xiangke Liao

    Abstract: Code comment generation aims at generating natural language descriptions for a code snippet to facilitate developers' program comprehension activities. Despite being studied for a long time, a bottleneck for existing approaches is that given a code snippet, they can only generate one comment while developers usually need to know information from diverse perspectives such as what is the functionali… ▽ More

    Submitted 14 June, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

    Comments: Accepted by the 46th International Conference on Software Engineering (ICSE 2024)

  22. arXiv:2302.14312  [pdf, other

    quant-ph cs.LG eess.SY

    Auxiliary Task-based Deep Reinforcement Learning for Quantum Control

    Authors: Shumin Zhou, Hailan Ma, Sen Kuang, Daoyi Dong

    Abstract: Due to its property of not requiring prior knowledge of the environment, reinforcement learning has significant potential for quantum control problems. In this work, we investigate the effectiveness of continuous control policies based on deep deterministic policy gradient. To solve the sparse reward signal in quantum learning control problems, we propose an auxiliary task-based deep reinforcement… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: 13 pages, 11 figures

  23. arXiv:2302.00282  [pdf, other

    cs.DC

    Xenos: Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices

    Authors: Zhang Runhua, Jiang Hongxu, Tian Fangzheng, Geng **kun, Li Xiaobin, Ma Yuhang, Zhu Chenhui, Dong Dong, Li Xin, Wang Haojie

    Abstract: Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Bes… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: The preliminary version is accepted by the 28th International Conference on Database Systems for Advanced Applications (DASFAA-2023)

  24. arXiv:2211.05528  [pdf, other

    cs.LG

    PAD-Net: An Efficient Framework for Dynamic Networks

    Authors: Shwai He, Liang Ding, Daize Dong, Boan Liu, Fuqiang Yu, Dacheng Tao

    Abstract: Dynamic networks, e.g., Dynamic Convolution (DY-Conv) and the Mixture of Experts (MoE), have been extensively explored as they can considerably improve the model's representation power with acceptable computational cost. The common practice in implementing dynamic networks is to convert the given static layers into fully dynamic ones where all parameters are dynamic (at least within a single layer… ▽ More

    Submitted 31 May, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: Proceedings of ACL 2023

  25. arXiv:2211.04310  [pdf, other

    cs.RO

    Safety-Critical Ergodic Exploration in Cluttered Environments via Control Barrier Functions

    Authors: Cameron Lerch, Dayi Dong, Ian Abraham

    Abstract: In this paper, we address the problem of safe trajectory planning for autonomous search and exploration in constrained, cluttered environments. Guaranteeing safe (collision-free) trajectories is a challenging problem that has garnered significant due to its importance in the successful utilization of robots in search and exploration tasks. This work contributes a method that generates guaranteed s… ▽ More

    Submitted 29 April, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

  26. arXiv:2210.16640  [pdf

    eess.IV cs.CV eess.SP q-bio.QM

    2D and 3D CT Radiomic Features Performance Comparison in Characterization of Gastric Cancer: A Multi-center Study

    Authors: Lingwei Meng, Di Dong, Xin Chen, Mengjie Fang, Rongpin Wang, **g Li, Zaiyi Liu, Jie Tian

    Abstract: Objective: Radiomics, an emerging tool for medical image analysis, is potential towards precisely characterizing gastric cancer (GC). Whether using one-slice 2D annotation or whole-volume 3D annotation remains a long-time debate, especially for heterogeneous GC. We comprehensively compared 2D and 3D radiomic features' representation and discrimination capacity regarding GC, via three tasks. Meth… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

    Comments: Published in IEEE Journal of Biomedical and Health Informatics

    Journal ref: IEEE.J.Biomed.Health.Inf. 25 (2021) 755-763

  27. arXiv:2210.04284  [pdf, other

    cs.CL

    SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters

    Authors: Shwai He, Liang Ding, Daize Dong, Miao Zhang, Dacheng Tao

    Abstract: Adapter Tuning, which freezes the pretrained language models (PLMs) and only fine-tunes a few extra modules, becomes an appealing efficient alternative to the full model fine-tuning. Although computationally efficient, the recent Adapters often increase parameters (e.g. bottleneck dimension) for matching the performance of full model fine-tuning, which we argue goes against their original intentio… ▽ More

    Submitted 10 November, 2022; v1 submitted 9 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  28. arXiv:2209.04894  [pdf, ps, other

    math.CO cs.CC

    Nearly all $k$-SAT functions are unate

    Authors: József Balogh, Dingding Dong, Bernard Lidický, Nitya Mani, Yufei Zhao

    Abstract: We prove that $1-o(1)$ fraction of all $k$-SAT functions on $n$ Boolean variables are unate (i.e., monotone after first negating some variables), for any fixed positive integer $k$ and as $n \to \infty$. This resolves a conjecture by Bollobás, Brightwell, and Leader from 2003.

    Submitted 3 October, 2023; v1 submitted 11 September, 2022; originally announced September 2022.

    Comments: 43 pages. v2 merges arXiv:2107.09233 (SODA22) and arXiv:2209.04894v1 (STOC23) along with expository improvements. This combined version is intended for journal submission

    MSC Class: 05A16; 05C65 ACM Class: G.2.1; G.2.2

  29. arXiv:2207.06667  [pdf, other

    cs.DC cs.AI cs.LG

    Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources

    Authors: Ji Liu, Daxiang Dong, Xi Wang, An Qin, Xingjian Li, Patrick Valduriez, De**g Dou, Dianhai Yu

    Abstract: Although more layers and more parameters generally improve the accuracy of the models, such big models generally have high computational complexity and require big memory, which exceed the capacity of small devices for inference and incurs long training time. In addition, it is difficult to afford long training time and inference time of big models even in high performance servers, as well. As an… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: To appear in Concurrency and Computation: Practice and Experience, 16 pages, 7 figures, 5 tables

  30. Subspace Phase Retrieval

    Authors: Mengchu Xu, Dekuan Dong, Jian Wang

    Abstract: In recent years, phase retrieval has received much attention in statistics, applied mathematics and optical engineering. In this paper, we propose an efficient algorithm, termed Subspace Phase Retrieval (SPR), which can accurately recover an $n$-dimensional $k$-sparse complex-valued signal $\x$ given its $Ω(k^2\log n)$ magnitude-only Gaussian samples if the minimum nonzero entry of $\x$ satisfies… ▽ More

    Submitted 7 April, 2024; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: To appear in IEEE Transactions on Information Theory, 2024, 33 pages, 10 figures

  31. arXiv:2205.12363  [pdf, ps, other

    math.CO cs.IT

    On the number of error correcting codes

    Authors: Dingding Dong, Nitya Mani, Yufei Zhao

    Abstract: We show that for a fixed $q$, the number of $q$-ary $t$-error correcting codes of length $n$ is at most $2^{(1 + o(1)) H_q(n,t)}$ for all $t \leq (1 - q^{-1})n - C_q\sqrt{n \log n}$ (for sufficiently large constant $C_q$), where $H_q(n, t) = q^n / V_q(n,t)$ is the Hamming bound and $V_q(n,t)$ is the cardinality of the radius $t$ Hamming ball. This proves a conjecture of Balogh, Treglown, and Wagne… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: 13 pages. Comments welcome!

    MSC Class: 05C30 ACM Class: G.2.2

  32. A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning

    Authors: Zhi Wang, Chunlin Chen, Daoyi Dong

    Abstract: While reinforcement learning (RL) algorithms are achieving state-of-the-art performance in various challenging tasks, they can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information. In the paper, we propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge while preventing past memories from be… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

    Comments: Manuscript accepted by IEEE Transactions on Cybernetics, 2022, DOI: DOI: 10.1109/TCYB.2022.3170485

  33. Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

    Authors: **mei Liu, Zhi Wang, Chunlin Chen, Daoyi Dong

    Abstract: Bayesian policy reuse (BPR) is a general policy transfer framework for selecting a source policy from an offline library by inferring the task belief based on some observation signals and a trained observation model. In this paper, we propose an improved BPR method to achieve more efficient policy transfer in deep reinforcement learning (DRL). First, most BPR algorithms use the episodic return as… ▽ More

    Submitted 13 July, 2023; v1 submitted 16 April, 2022; originally announced April 2022.

    Comments: Published in IEEE Transactions on Neural Networks and Learning Systems, 2023, DOI: 10.1109/TNNLS.2023.3281604

  34. arXiv:2204.02227  [pdf, other

    cs.CV cs.AI

    SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution

    Authors: Shwai He, Chenbo Jiang, Daize Dong, Liang Ding

    Abstract: Dynamic convolution achieves better performance for efficient CNNs at the cost of negligible FLOPs increase. However, the performance increase can not match the significantly expanded number of parameters, which is the main bottleneck in real-world applications. Contrastively, mask-based unstructured pruning obtains a lightweight network by removing redundancy in the heavy network. In this paper,… ▽ More

    Submitted 26 May, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: WACV 2023

  35. Depthwise Convolution for Multi-Agent Communication with Enhanced Mean-Field Approximation

    Authors: Donghan Xie, Zhi Wang, Chunlin Chen, Daoyi Dong

    Abstract: Multi-agent settings remain a fundamental challenge in the reinforcement learning (RL) domain due to the partial observability and the lack of accurate real-time interactions across agents. In this paper, we propose a new method based on local communication learning to tackle the multi-agent RL (MARL) challenge within a large number of agents coexisting. First, we design a new communication protoc… ▽ More

    Submitted 1 January, 2023; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: Accepted by IEEE Transactions on Neural Networks, 2022, DOI: 10.1109/TNNLS.2022.3230701

  36. arXiv:2112.15270  [pdf

    cs.ET

    Echo state graph neural networks with analogue random resistor arrays

    Authors: Shaocong Wang, Yi Li, Dingchen Wang, Woyu Zhang, Xi Chen, Danian Dong, Songqi Wang, Xumeng Zhang, Peng Lin, Claudio Gallicchio, Xiaoxin Xu, Qi Liu, Kwang-Ting Cheng, Zhongrui Wang, Dashan Shang, Ming Liu

    Abstract: Recent years have witnessed an unprecedented surge of interest, from social networks to drug discovery, in learning representations of graph-structured data. However, graph neural networks, the machine learning models for handling graph-structured data, face significant challenges when running on conventional digital hardware, including von Neumann bottleneck incurred by physically separated memor… ▽ More

    Submitted 30 December, 2021; originally announced December 2021.

    Comments: 24 pages, 4 figures

  37. arXiv:2110.07770  [pdf

    cs.HC

    Human factors engineering research on single pilot operations for large commercial aircraft: Status and prospect

    Authors: Wei Xu, Yong Chen, Wenjun Dong, Dayong Dong, Liezhong Ge

    Abstract: The civil aviation community is actively exploring and develo** the solutions of single pilot operations SPO for large commercial aircraft. Human factors engineering research for SPO has been launched, and the research mainly focuses on three research solutions: flight deck airborne equipment upgrade, flight support from ground stations, and the combined SPO solution of "flight deck airborne equ… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: in Chinese language

  38. Residual Tensor Train: A Quantum-inspired Approach for Learning Multiple Multilinear Correlations

    Authors: Yiwei Chen, Yu Pan, Daoyi Dong

    Abstract: States of quantum many-body systems are defined in a high-dimensional Hilbert space, where rich and complex interactions among subsystems can be modelled. In machine learning, complex multiple multilinear correlations may also exist within input features. In this paper, we present a quantum-inspired multilinear model, named Residual Tensor Train (ResTT), to capture the multiple multilinear correla… ▽ More

    Submitted 1 August, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

    Comments: 12 pages, 6 figures

  39. arXiv:2106.10796  [pdf, other

    cs.LG cs.DC

    CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation

    Authors: Enda Yu, Dezun Dong, Yemao Xu, Shuo Ouyang, Xiangke Liao

    Abstract: Communication overhead is the key challenge for distributed training. Gradient compression is a widely used approach to reduce communication traffic. When combining with parallel communication mechanism method like pipeline, gradient compression technique can greatly alleviate the impact of communication overhead. However, there exists two problems of gradient compression technique to be solved. F… ▽ More

    Submitted 6 September, 2021; v1 submitted 20 June, 2021; originally announced June 2021.

    Comments: 12 pages

  40. arXiv:2106.01674  [pdf, other

    cs.IR cs.DC cs.LG

    JIZHI: A Fast and Cost-Effective Model-As-A-Service System for Web-Scale Online Inference at Baidu

    Authors: Hao Liu, Qian Gao, Jiang Li, Xiaochao Liao, Hao Xiong, Guangxing Chen, Wenlin Wang, Guobao Yang, Zhiwei Zha, Daxiang Dong, De**g Dou, Haoyi Xiong

    Abstract: In modern internet industries, deep learning based recommender systems have became an indispensable building block for a wide spectrum of applications, such as search engine, news feed, and short video clips. However, it remains challenging to carry the well-trained deep models for online real-time inference serving, with respect to the time-varying web-scale traffics from billions of users, in a… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Comments: Accepted to SIGKDD 2021 applied data science track

  41. Rule-Based Reinforcement Learning for Efficient Robot Navigation with Space Reduction

    Authors: Yuanyang Zhu, Zhi Wang, Chunlin Chen, Daoyi Dong

    Abstract: For real-world deployments, it is critical to allow robots to navigate in complex environments autonomously. Traditional methods usually maintain an internal map of the environment, and then design several simple rules, in conjunction with a localization and planning approach, to navigate through the internal map. These approaches often involve a variety of assumptions and prior knowledge. In cont… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: Accepted by IEEE/ASME Transactions on Mechatronics, 2021, DOI: 10.1109/TMECH.2021.3072675

    Journal ref: IEEE/ASME Transactions on Mechatronics, 2021

  42. arXiv:2104.02774  [pdf, other

    cs.CR cs.LG eess.SY

    Bayesian adversarial multi-node bandit for optimal smart grid protection against cyber attacks

    Authors: Jianyu Xu, Bin Liu, Huadong Mo, Daoyi Dong

    Abstract: The cybersecurity of smart grids has become one of key problems in develo** reliable modern power and energy systems. This paper introduces a non-stationary adversarial cost with a variation constraint for smart grids and enables us to investigate the problem of optimal smart grid protection against cyber attacks in a relatively practical scenario. In particular, a Bayesian multi-node bandit (MN… ▽ More

    Submitted 20 February, 2021; originally announced April 2021.

    Journal ref: Automatica, 2021

  43. arXiv:2101.02034  [pdf, other

    cs.LG cs.AI quant-ph

    Deep Reinforcement Learning with Quantum-inspired Experience Replay

    Authors: Qing Wei, Hailan Ma, Chunlin Chen, Daoyi Dong

    Abstract: In this paper, a novel training paradigm inspired by quantum computation is proposed for deep reinforcement learning (DRL) with experience replay. In contrast to traditional experience replay mechanism in DRL, the proposed deep reinforcement learning with quantum-inspired experience replay (DRL-QER) adaptively chooses experiences from the replay buffer according to the complexity and the replayed… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  44. arXiv:2012.15427  [pdf, other

    quant-ph cs.LG eess.SY

    Curriculum-based Deep Reinforcement Learning for Quantum Control

    Authors: Hailan Ma, Daoyi Dong, Steven X. Ding, Chunlin Chen

    Abstract: Deep reinforcement learning has been recognized as an efficient technique to design optimal strategies for different complex systems without prior knowledge of the control landscape. To achieve a fast and precise control for quantum systems, we propose a novel deep reinforcement learning approach by constructing a curriculum consisting of a set of intermediate tasks defined by a fidelity threshold… ▽ More

    Submitted 2 January, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

  45. arXiv:2012.05396  [pdf, other

    cs.DC

    SSD-SSD: Communication sparsification for distributed deep learning training

    Authors: Yemao Xu, Dezun Dong, Yawei Zhao, Weixia Xu, Xiangke Liao

    Abstract: Intensive communication and synchronization cost for gradients and parameters is the well-known bottleneck of distributed deep learning training. Based on the observations that Synchronous SGD (SSGD) obtains good convergence accuracy while asynchronous SGD (ASGD) delivers a faster raw training speed, we propose Several Steps Delay SGD (SSD-SGD) to combine their merits, aiming at tackling the commu… ▽ More

    Submitted 9 April, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

  46. arXiv:2010.08191  [pdf, other

    cs.CL cs.IR

    RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering

    Authors: Yingqi Qu, Yuchen Ding, **g Liu, Kai Liu, Ruiyang Ren, Wayne Xin Zhao, Daxiang Dong, Hua Wu, Haifeng Wang

    Abstract: In open-domain question answering, dense passage retrieval has become a new paradigm to retrieve relevant passages for finding answers. Typically, the dual-encoder architecture is adopted to learn dense representations of questions and passages for semantic matching. However, it is difficult to effectively train a dual-encoder due to the challenges including the discrepancy between training and in… ▽ More

    Submitted 12 May, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

    Comments: NAACL 2021

  47. Instance Weighted Incremental Evolution Strategies for Reinforcement Learning in Dynamic Environments

    Authors: Zhi Wang, Chunlin Chen, Daoyi Dong

    Abstract: Evolution strategies (ES), as a family of black-box optimization algorithms, recently emerge as a scalable alternative to reinforcement learning (RL) approaches such as Q-learning or policy gradient, and are much faster when many central processing units (CPUs) are available due to better parallelization. In this paper, we propose a systematic incremental learning method for ES in dynamic environm… ▽ More

    Submitted 31 March, 2022; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems, 2022, DOI: 10.1109/TNNLS.2022.3160173

  48. arXiv:2008.09943  [pdf, other

    cs.CL cs.AI quant-ph

    Quantum Language Model with Entanglement Embedding for Question Answering

    Authors: Yiwei Chen, Yu Pan, Daoyi Dong

    Abstract: Quantum Language Models (QLMs) in which words are modelled as quantum superposition of sememes have demonstrated a high level of model transparency and good post-hoc interpretability. Nevertheless, in the current literature word sequences are basically modelled as a classical mixture of word states, which cannot fully exploit the potential of a quantum probabilistic description. A full quantum mod… ▽ More

    Submitted 20 December, 2021; v1 submitted 22 August, 2020; originally announced August 2020.

  49. Lifelong Incremental Reinforcement Learning with Online Bayesian Inference

    Authors: Zhi Wang, Chunlin Chen, Daoyi Dong

    Abstract: A central capability of a long-lived reinforcement learning (RL) agent is to incrementally adapt its behavior as its environment changes, and to incrementally build upon previous experiences to facilitate future learning in real-world scenarios. In this paper, we propose LifeLong Incremental Reinforcement Learning (LLIRL), a new incremental algorithm for efficient lifelong adaptation to dynamic en… ▽ More

    Submitted 12 February, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems, 2021

  50. arXiv:2006.08181  [pdf, ps, other

    math.OC cs.CC math.NA

    Derivative-free global minimization for a class of multiple minima problems

    Authors: Xiaopeng Luo, Xin Xu, Daoyi Dong

    Abstract: We prove that the finite-difference based derivative-free descent (FD-DFD) methods have a capability to find the global minima for a class of multiple minima problems. Our main result shows that, for a class of multiple minima objectives that is extended from strongly convex functions with Lipschitz-continuous gradients, the iterates of FD-DFD converge to the global minimizer $x_*$ with the linear… ▽ More

    Submitted 25 June, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: 14 pages, 3 figures

    MSC Class: 65K05; 68Q25; 90C26; 90C56