Skip to main content

Showing 1–50 of 169 results for author: Zhu, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12297  [pdf, other

    cs.LG cs.AI

    Faithful Density-Peaks Clustering via Matrix Computations on MPI Parallelization System

    Authors: Ji Xu, Tianlong Xiao, **ye Yang, Panpan Zhu

    Abstract: Density peaks clustering (DP) has the ability of detecting clusters of arbitrary shape and clustering non-Euclidean space data, but its quadratic complexity in both computing and storage makes it difficult to scale for big data. Various approaches have been proposed in this regard, including MapReduce based distribution computing, multi-core parallelism, presentation transformation (e.g., kd-tree,… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: This paper presents a novel approach FaithPDP that takes advantages of both hardware (multi-core architecture of CPU) and modern programming language (Python or Matlab for efficient vector and matrix computation) to achieve clustering result identical to vanilla DP algorithm, while the computing complexity is reduced to pseudo-linear

  2. arXiv:2406.03751  [pdf, other

    cs.LG

    Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting

    Authors: Yifan Hu, Peiyuan Liu, Peng Zhu, Dawei Cheng, Tao Dai

    Abstract: Transformer-based and MLP-based methods have emerged as leading approaches in time series forecasting (TSF). While Transformer-based methods excel in capturing long-range dependencies, they suffer from high computational complexities and tend to overfit. Conversely, MLP-based methods offer computational efficiency and adeptness in modeling temporal dynamics, but they struggle with capturing comple… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. Node Injection Attack Based on Label Propagation Against Graph Neural Network

    Authors: Peican Zhu, Zechen Pan, Keke Tang, Xiaodong Cui, **huan Wang, Qi Xuan

    Abstract: Graph Neural Network (GNN) has achieved remarkable success in various graph learning tasks, such as node classification, link prediction and graph classification. The key to the success of GNN lies in its effective structure information representation through neighboring aggregation. However, the attacker can easily perturb the aggregation process through injecting fake nodes, which reveals that G… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by TCSS;DOI:10.1109/TCSS.2024.3395794

  4. arXiv:2405.11276  [pdf, other

    cs.CV

    Visible and Clear: Finding Tiny Objects in Difference Map

    Authors: Bing Cao, Haiyu Yao, Pengfei Zhu, Qinghua Hu

    Abstract: Tiny object detection is one of the key challenges in the field of object detection. The performance of most generic detectors dramatically decreases in tiny object detection tasks. The main challenge lies in extracting effective features of tiny objects. Existing methods usually perform generation-based feature enhancement, which is seriously affected by spurious textures and artifacts, making it… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  5. arXiv:2405.06241  [pdf, other

    cs.CV cs.RO

    MGS-SLAM: Monocular Sparse Tracking and Gaussian Map** with Depth Smooth Regularization

    Authors: Pengcheng Zhu, Yaoming Zhuang, Baoquan Chen, Li Li, Chengdong Wu, Zhanlin Liu

    Abstract: This letter introduces a novel framework for dense Visual Simultaneous Localization and Map** (VSLAM) based on Gaussian Splatting. Recently Gaussian Splatting-based SLAM has yielded promising results, but rely on RGB-D input and is weak in tracking. To address these limitations, we uniquely integrates advanced sparse visual odometry with a dense Gaussian Splatting scene representation for the fi… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  6. arXiv:2405.04000  [pdf, other

    cs.RO eess.SY

    Distributed Invariant Kalman Filter for Cooperative Localization using Matrix Lie Groups

    Authors: Yizhi Zhou, Yufan Liu, Pengxiang Zhu, Xuan Wang

    Abstract: This paper studies the problem of Cooperative Localization (CL) for multi-robot systems, where a group of mobile robots jointly localize themselves by using measurements from onboard sensors and shared information from other robots. We propose a novel distributed invariant Kalman Filter (DInEKF) based on the Lie group theory, to solve the CL problem in a 3-D environment. Unlike the standard EKF wh… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  7. arXiv:2404.19578  [pdf, ps, other

    cs.IT

    New EVENODD+ Codes with More Flexible Parameters and Lower Complexity

    Authors: Panyu Zhu

    Abstract: EVENODD+ codes are binary maximum distance separable (MDS) array codes for correcting double disk failures in RAID-6 with asymptotically optimal encoding/decoding/update complexities. However, the number of bits stored in each disk of EVENODD+ codes should be an odd number minus one. In this paper, we present a new construction of EVENODD+ codes that have more flexible parameters. The number of bi… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  8. arXiv:2404.19242  [pdf, other

    cs.CV eess.IV stat.ME

    A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems

    Authors: Xin Ma, Puchen Zhu, Xiao Li, Xiaoyin Zheng, Jianshu Zhou, Xuchen Wang, Kwok Wai Samuel Au

    Abstract: Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial an… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement

  9. arXiv:2404.15744  [pdf, other

    cs.LG cs.AI cs.CR

    A General Black-box Adversarial Attack on Graph-based Fake News Detectors

    Authors: Peican Zhu, Zechen Pan, Yang Liu, Jiwei Tian, Keke Tang, Zhen Wang

    Abstract: Graph Neural Network (GNN)-based fake news detectors apply various methods to construct graphs, aiming to learn distinctive news embeddings for classification. Since the construction details are unknown for attackers in a black-box scenario, it is unrealistic to conduct the classical adversarial attacks that require a specific adjacency matrix. In this paper, we propose the first general black-box… ▽ More

    Submitted 25 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI2024

  10. arXiv:2404.14109  [pdf, other

    cs.CV

    CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective

    Authors: Wencheng Zhu, Xin Zhou, Pengfei Zhu, Yu Wang, Qinghua Hu

    Abstract: In this paper, we present a simple yet effective contrastive knowledge distillation approach, which can be formulated as a sample-wise alignment problem with intra- and inter-sample constraints. Unlike traditional knowledge distillation methods that concentrate on maximizing feature similarities or preserving class-wise semantic correlations between teacher and student features, our method attempt… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  11. arXiv:2404.09401  [pdf, other

    cs.CV cs.AI

    Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models

    Authors: Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka

    Abstract: Diffusion Models (DMs) have shown remarkable capabilities in various image-generation tasks. However, there are growing concerns that DMs could be used to imitate unauthorized creations and thus raise copyright issues. To address this issue, we propose a novel framework that embeds personal watermarks in the generation of adversarial examples. Such examples can force DMs to generate images with vi… ▽ More

    Submitted 19 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: updated references

  12. arXiv:2404.08958  [pdf, other

    cs.CV cs.CL cs.LG

    AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

    Authors: Yuwei Tang, Zhenyi Lin, Qilong Wang, Pengfei Zhu, Qinghua Hu

    Abstract: Recently, pre-trained vision-language models (e.g., CLIP) have shown great potential in few-shot learning and attracted a lot of research interest. Although efforts have been made to improve few-shot ability of CLIP, key factors on the effectiveness of existing methods have not been well studied, limiting further exploration of CLIP's potential in few-shot learning. In this paper, we first introdu… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  13. arXiv:2404.07721  [pdf, other

    eess.SP cs.IT

    Trainable Joint Channel Estimation, Detection and Decoding for MIMO URLLC Systems

    Authors: Yi Sun, Hong Shen, Bingqing Li, Wei Xu, Pengcheng Zhu, Nan Hu, Chunming Zhao

    Abstract: The receiver design for multi-input multi-output (MIMO) ultra-reliable and low-latency communication (URLLC) systems can be a tough task due to the use of short channel codes and few pilot symbols. Consequently, error propagation can occur in traditional turbo receivers, leading to performance degradation. Moreover, the processing delay induced by information exchange between different modules may… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 17 pages, 12 figures, accepted by IEEE Transactions on Wireless Communications

  14. arXiv:2403.18923  [pdf, other

    cs.NE cs.AI cs.LG

    Nature-Guided Cognitive Evolution for Predicting Dissolved Oxygen Concentrations in North Temperate Lakes

    Authors: Runlong Yu, Robert Ladwig, Xiang Xu, Peijun Zhu, Paul C. Hanson, Yiqun Xie, Xiaowei Jia

    Abstract: Predicting dissolved oxygen (DO) concentrations in north temperate lakes requires a comprehensive study of phenological patterns across various ecosystems, which highlights the significance of selecting phenological features and feature interactions. Process-based models are limited by partial process knowledge or oversimplified feature representations, while machine learning models face challenge… ▽ More

    Submitted 15 February, 2024; originally announced March 2024.

  15. arXiv:2403.15765  [pdf, other

    cs.CV cs.AI cs.IR

    Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents

    Authors: Hao Wang, Tang Li, Chenhui Chu, Nengjun Zhu, Rui Wang, Pinpin Zhu

    Abstract: Key-value relations are prevalent in Visually-Rich Documents (VRDs), often depicted in distinct spatial regions accompanied by specific color and font styles. These non-textual cues serve as important indicators that greatly enhance human comprehension and acquisition of such relation triplets. However, current document AI approaches often fail to consider this valuable prior information related t… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 13 pages, 7 figures, accepted by LERC-COLING2024

  16. arXiv:2403.12494  [pdf, other

    cs.CV

    Task-Customized Mixture of Adapters for General Image Fusion

    Authors: Pengfei Zhu, Yang Sun, Bing Cao, Qinghua Hu

    Abstract: General image fusion aims at integrating important information from multi-source images. However, due to the significant cross-task gap, the respective fusion mechanism varies considerably in practice, resulting in limited performance across subtasks. To handle this problem, we propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusio… ▽ More

    Submitted 23 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  17. arXiv:2403.06687  [pdf, other

    cs.LG cs.CV

    Advancing Graph Neural Networks with HL-HGAT: A Hodge-Laplacian and Attention Mechanism Approach for Heterogeneous Graph-Structured Data

    Authors: **ghan Huang, Qiufeng Chen, Yijun Bian, Pengli Zhu, Nanguang Chen, Moo K. Chung, Anqi Qiu

    Abstract: Graph neural networks (GNNs) have proven effective in capturing relationships among nodes in a graph. This study introduces a novel perspective by considering a graph as a simplicial complex, encompassing nodes, edges, triangles, and $k$-simplices, enabling the definition of graph-structured data on any $k$-simplices. Our contribution is the Hodge-Laplacian heterogeneous graph attention network (H… ▽ More

    Submitted 22 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  18. arXiv:2403.03346  [pdf, other

    cs.CV

    Enhancing Vision-Language Pre-training with Rich Supervisions

    Authors: Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto

    Abstract: We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering. Using web screenshots unlocks a treasure trove of visual and textual cues that are not present in using image-text pairs. In S4, we leverage the inherent tree-structured hierarchy of HTML elements and the spatial localiza… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  19. arXiv:2403.00014  [pdf, other

    cs.SI cs.AI cs.LG

    GIN-SD: Source Detection in Graphs with Incomplete Nodes via Positional Encoding and Attentive Fusion

    Authors: Le Cheng, Peican Zhu, Keke Tang, Chao Gao, Zhen Wang

    Abstract: Source detection in graphs has demonstrated robust efficacy in the domain of rumor source identification. Although recent solutions have enhanced performance by leveraging deep neural networks, they often require complete user data. In this paper, we address a more challenging task, rumor source detection with incomplete user data, and propose a novel framework, i.e., Source Detection in Graphs wi… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

    Comments: The paper is accepted by AAAI24

    Report number: Vol. 38, No. 1, 55-63

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence 2024

  20. arXiv:2402.16699  [pdf, other

    cs.RO

    SwarmPRM: Probabilistic Roadmap Motion Planning for Large-Scale Swarm Robotic Systems

    Authors: Yunze Hu, Xuru Yang, Kangjie Zhou, Qinghang Liu, Kang Ding, Han Gao, **** Zhu, Chang Liu

    Abstract: Large-scale swarm robotic systems consisting of numerous cooperative agents show considerable promise for performing autonomous tasks across various sectors. Nonetheless, traditional motion planning approaches often face a trade-off between scalability and solution quality due to the exponential growth of the joint state space of robots. In response, this work proposes SwarmPRM, a hierarchical, sc… ▽ More

    Submitted 24 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Submitted to IROS 2024

  21. arXiv:2402.16690  [pdf, other

    cs.RO

    Risk-Aware Non-Myopic Motion Planner for Large-Scale Robotic Swarm Using CVaR Constraints

    Authors: Xuru Yang, Yunze Hu, Han Gao, Kang Ding, Zhaoyang Li, **** Zhu, Ying Sun, Chang Liu

    Abstract: Swarm robotics has garnered significant attention due to its ability to accomplish elaborate and synchronized tasks. Existing methodologies for motion planning of swarm robotic systems mainly encounter difficulties in scalability and safety guarantee. To address these limitations, we propose a Risk-aware swarm mOtion planner using conditional ValuE at Risk (ROVER) that systematically navigates lar… ▽ More

    Submitted 15 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 9 pages, 5 figures

  22. arXiv:2402.11091  [pdf, other

    cs.MA cs.RO

    A Novel Multivariate Skew-Normal Mixture Model and Its Application in Path-Planning for Very-Large-Scale Robotic Systems

    Authors: **** Zhu, Chang Liu, Peter Estephan

    Abstract: This paper addresses the path-planning challenge for very large-scale robotic systems (VLSR) operating in complex and cluttered environments. VLSR systems consist of numerous cooperative agents or robots working together autonomously. Traditionally, many approaches for VLSR systems are developed based on Gaussian mixture models (GMMs), where the GMMs represent agents' evolving spatial distribution… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: American Control Conference (ACC) 2024, July 10 - 12, 2024

  23. arXiv:2402.10586  [pdf, other

    cs.CL cs.AI

    Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs

    Authors: Zae Myung Kim, Kwang Hee Lee, Preston Zhu, Vipul Raheja, Dongyeop Kang

    Abstract: With the advent of large language models (LLM), the line between human-crafted and machine-generated texts has become increasingly blurred. This paper delves into the inquiry of identifying discernible and unique linguistic properties in texts that were written by humans, particularly uncovering the underlying discourse structures of texts beyond their surface structures. Introducing a novel metho… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 26 pages, accepted at ACL 2024 (Main)

  24. arXiv:2401.06595  [pdf, other

    cs.LG cs.AI

    Every Node is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

    Authors: Pengfei Zhu, Qian Wang, Yu Wang, Jialu Li, Qinghua Hu

    Abstract: Attributed graph clustering is an unsupervised task that partitions nodes into different groups. Self-supervised learning (SSL) shows great potential in handling this task, and some recent studies simultaneously learn multiple SSL tasks to further boost performance. Currently, different SSL tasks are assigned the same set of weights for all graph nodes. However, we observe that some graph nodes wh… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  25. arXiv:2401.06521  [pdf, other

    cs.CV

    Exploring Diverse Representations for Open Set Recognition

    Authors: Yu Wang, Junxian Mu, Pengfei Zhu, Qinghua Hu

    Abstract: Open set recognition (OSR) requires the model to classify samples that belong to closed sets while rejecting unknown samples during test. Currently, generative models often perform better than discriminative models in OSR, but recent studies show that generative models may be computationally infeasible or unstable on complex tasks. In this paper, we provide insights into OSR and find that learning… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: 9 pages, 4 figures. Accepted to AAAI 2024

  26. arXiv:2401.02916  [pdf, other

    cs.CV

    Uncovering the human motion pattern: Pattern Memory-based Diffusion Model for Trajectory Prediction

    Authors: Yuxin Yang, Pengfei Zhu, Mengshi Qi, Huadong Ma

    Abstract: Human trajectory forecasting is a critical challenge in fields such as robotics and autonomous driving. Due to the inherent uncertainty of human actions and intentions in real-world scenarios, various unexpected occurrences may arise. To uncover latent motion patterns in human behavior, we introduce a novel memory-based method, named Motion Pattern Priors Memory Network. Our method involves constr… ▽ More

    Submitted 8 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  27. arXiv:2312.16850  [pdf, other

    cs.SD eess.AS

    Accent-VITS:accent transfer for end-to-end TTS

    Authors: Linhan Ma, Yongmao Zhang, Xinfa Zhu, Yi Lei, Ziqian Ning, Pengcheng Zhu, Lei Xie

    Abstract: Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker's voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangled in speech. This paper presents a VITS-based end-to-end accent transfer model named Accent-VITS.Based on the main structure of VITS, Accent-VITS makes substantial improvements to enable… ▽ More

    Submitted 29 December, 2023; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by NCMMSC2023

  28. arXiv:2312.16409  [pdf, other

    cs.LG cs.CV

    Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning

    Authors: Yan Fan, Yu Wang, Pengfei Zhu, Qinghua Hu

    Abstract: Continual learning (CL) has shown promising results and comparable performance to learning at once in a fully supervised manner. However, CL strategies typically require a large number of labeled samples, making their real-life deployment challenging. In this work, we focus on semi-supervised continual learning (SSCL), where the model progressively learns from partially labeled data with unknown c… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  29. arXiv:2312.10611  [pdf, other

    cs.CV cs.AI

    Bi-directional Adapter for Multi-modal Tracking

    Authors: Bing Cao, Junliang Guo, Pengfei Zhu, Qinghua Hu

    Abstract: Due to the rapid development of computer vision, single-modal (RGB) object tracking has made significant progress in recent years. Considering the limitation of single imaging sensor, multi-modal images (RGB, Infrared, etc.) are introduced to compensate for this deficiency for all-weather object tracking in complex environments. However, as acquiring sufficient multi-modal tracking data is hard wh… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024. Code is available at https://github.com/SparkTempest/BAT

  30. arXiv:2311.17361  [pdf

    cs.CV

    How does spatial structure affect psychological restoration? A method based on Graph Neural Networks and Street View Imagery

    Authors: Haoran Ma, Yan Zhang, Pengyuan Liu, Fan Zhang, Pengyu Zhu

    Abstract: The Attention Restoration Theory (ART) presents a theoretical framework with four essential indicators (being away, extent, fascinating, and compatibility) for comprehending urban and natural restoration quality. However, previous studies relied on non-sequential data and non-spatial dependent methods, which overlooks the impact of spatial structure defined here as the positional relationships bet… ▽ More

    Submitted 29 November, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: 33 pages, 7 figures, Under review

  31. arXiv:2311.08623  [pdf, other

    cs.CV cs.CL cs.LG

    DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

    Authors: Peng Tang, Pengkai Zhu, Tian Li, Srikar Appalaraju, Vijay Mahadevan, R. Manmatha

    Abstract: Encoder-decoder transformer models have achieved great success on various vision-language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes up most of the latency because of the auto-regressive decoding. To accelerate the inference, we propose an approach of performing Dynamic Early Exit on Decoder (DEED). We build a multi-exit encoder-decoder transformer model… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  32. arXiv:2311.05717  [pdf, other

    cs.RO

    PL-CVIO: Point-Line Cooperative Visual-Inertial Odometry

    Authors: Yanyu Zhang, Pengxiang Zhu, Wei Ren

    Abstract: Low-feature environments are one of the main Achilles' heels of geometric computer vision (CV) algorithms. In most human-built scenes often with low features, lines can be considered complements to points. In this paper, we present a multi-robot cooperative visual-inertial navigation system (VINS) using both point and line features. By utilizing the covariance intersection (CI) update within the m… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  33. arXiv:2311.03650  [pdf, other

    cs.CV

    Image Generation and Learning Strategy for Deep Document Forgery Detection

    Authors: Yamato Okamoto, Osada Genki, Iu Yahiro, Rintaro Hasegawa, Peifei Zhu, Hirokatsu Kataoka

    Abstract: In recent years, document processing has flourished and brought numerous benefits. However, there has been a significant rise in reported cases of forged document images. Specifically, recent advancements in deep neural network (DNN) methods for generative tasks may amplify the threat of document forgery. Traditional approaches for forged document images created by prevalent copy-move methods are… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  34. arXiv:2311.03419  [pdf, other

    eess.AS cs.LG cs.SD

    Personalizing Keyword Spotting with Speaker Information

    Authors: Beltrán Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati, Quan Wang, Alicia Lozano-Diez, Alex Park, Ignacio López Moreno

    Abstract: Keyword spotting systems often struggle to generalize to a diverse population with various accents and age groups. To address this challenge, we propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM), a recent method for learning from multiple sources of information. We explore both Text-Dependent and Text-Independent speaker… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  35. arXiv:2311.02835  [pdf

    cs.CV

    Flexible Multi-Generator Model with Fused Spatiotemporal Graph for Trajectory Prediction

    Authors: Peiyuan Zhu, Fengxia Han, Hao Deng

    Abstract: Trajectory prediction plays a vital role in automotive radar systems, facilitating precise tracking and decision-making in autonomous driving. Generative adversarial networks with the ability to learn a distribution over future trajectories tend to predict out-of-distribution samples, which typically occurs when the distribution of forthcoming paths comprises a blend of various manifolds that may… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  36. arXiv:2310.14581  [pdf, other

    cs.CV cs.AI

    Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track

    Authors: Shuhei Yokoo, Peifei Zhu, Yuchi Ishikawa, Mikihiro Tanaka, Masayoshi Kondo, Hirokatsu Kataoka

    Abstract: Large web crawl datasets have already played an important role in learning multimodal features with high generalization capabilities. However, there are still very limited studies investigating the details or improvements of data design. Recently, a DataComp challenge has been designed to propose the best training data with the fixed models. This paper presents our solution to both filtering track… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted at the ICCV 2023 Workshop on Towards the Next Generation of Computer Vision Datasets: DataComp Track

  37. arXiv:2310.00592  [pdf, other

    quant-ph cs.ET

    Nearest neighbor synthesis of CNOT circuits on general quantum architectures

    Authors: Xinyu Chen, Mingqiang Zhu, Xueyun Cheng, Pengcheng Zhu, Zhi** Guan

    Abstract: In recent years, quantum computing has entered the Noisy Intermediate-Scale Quantum (NISQ). However, NISQ devices have inherent limitations in terms of connectivity and hardware noise, necessitating the transformation of quantum logic circuits for correct execution on NISQ chips. The synthesis of CNOT circuits considering physical constraints can transform quantum algorithms into low-level quantum… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  38. arXiv:2309.15496  [pdf, other

    eess.AS cs.SD

    DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

    Authors: Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Voice conversion is becoming increasingly popular, and a growing number of application scenarios require models with streaming inference capabilities. The recently proposed DualVC attempts to achieve this objective through streaming model architecture design and intra-model knowledge distillation along with hybrid predictive coding to compensate for the lack of future information. However, DualVC… ▽ More

    Submitted 18 January, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP2024

  39. arXiv:2309.08980  [pdf, other

    cs.IT

    Differential Modulation for Short Packet Transmission in URLLC

    Authors: Canjian Zheng, Fu-Chun Zheng, **g**g Luo, Pengcheng Zhu, Xiaohu You, Daquan Feng

    Abstract: One key feature of ultra-reliable low-latency communications (URLLC) in 5G is to support short packet transmission (SPT). However, the pilot overhead in SPT for channel estimation is relatively high, especially in high Doppler environments. In this paper, we advocate the adoption of differential modulation to support ultra-low latency services, which can ease the channel estimation burden and redu… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: 15 pages, 9 figures

  40. arXiv:2309.07720  [pdf, other

    cs.RO

    Heuristic Satisficing Inferential Decision Making in Human and Robot Active Perception

    Authors: Yucheng Chen, **** Zhu, Anthony Alers, Tobias Egner, Marc A. Sommer, Silvia Ferrari

    Abstract: Inferential decision-making algorithms typically assume that an underlying probabilistic model of decision alternatives and outcomes may be learned a priori or online. Furthermore, when applied to robots in real-world settings they often perform unsatisfactorily or fail to accomplish the necessary tasks because this assumption is violated and/or they experience unanticipated external pressures and… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  41. arXiv:2309.06751  [pdf, other

    cs.CV

    Remote Sensing Object Detection Meets Deep Learning: A Meta-review of Challenges and Advances

    Authors: Xiangrong Zhang, Tianyang Zhang, Guanchun Wang, Peng Zhu, Xu Tang, ** Jia, Licheng Jiao

    Abstract: Remote sensing object detection (RSOD), one of the most fundamental and challenging tasks in the remote sensing field, has received longstanding attention. In recent years, deep learning techniques have demonstrated robust feature representation capabilities and led to a big leap in the development of RSOD techniques. In this era of rapid technical evolution, this review aims to present a comprehe… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted with IEEE Geoscience and Remote Sensing Magazine. More than 300 papers relevant to the RSOD filed were reviewed in this survey

  42. arXiv:2309.00559  [pdf, other

    eess.SP cs.IT

    Signal Processing and Learning for Next Generation Multiple Access in 6G

    Authors: Wei Chen, Yuanwei Liu, Hamid Jafarkhani, Yonina C. Eldar, Peiying Zhu, Khaled B Letaief

    Abstract: Wireless communication systems to date primarily rely on the orthogonality of resources to facilitate the design and implementation, from user access to data transmission. Emerging applications and scenarios in the sixth generation (6G) wireless systems will require massive connectivity and transmission of a deluge of data, which calls for more flexibility in the design concept that goes beyond or… ▽ More

    Submitted 9 September, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

  43. arXiv:2308.15072  [pdf, other

    cs.LG cs.CR

    Advancing Adversarial Robustness Through Adversarial Logit Update

    Authors: Hao Xuan, Peican Zhu, Xingyu Li

    Abstract: Deep Neural Networks are susceptible to adversarial perturbations. Adversarial training and adversarial purification are among the most widely recognized defense strategies. Although these methods have different underlying logic, both rely on absolute logit values to generate label predictions. In this study, we theoretically analyze the logit difference around successful adversarial attacks from… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  44. arXiv:2308.10428  [pdf, other

    eess.AS cs.SD

    Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models

    Authors: Heyang Xue, Shuai Guo, Pengcheng Zhu, Mengxiao Bi

    Abstract: Despite imperfect score-matching causing drift in training and sampling distributions of diffusion models, recent advances in diffusion-based acoustic models have revolutionized data-sufficient single-speaker Text-to-Speech (TTS) approaches, with Grad-TTS being a prime example. However, the sampling drift problem leads to these approaches struggling in multi-speaker scenarios in practice due to mo… ▽ More

    Submitted 31 August, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

  45. A Small Form Factor Aerial Research Vehicle for Pick-and-Place Tasks with Onboard Real-Time Object Detection and Visual Odometry

    Authors: Cora A. Dimmig, Anna Goodridge, Gabriel Baraban, Pupei Zhu, Joyraj Bhowmick, Marin Kobilarov

    Abstract: This paper introduces a novel, small form-factor, aerial vehicle research platform for agile object detection, classification, tracking, and interaction tasks. General-purpose hardware components were designed to augment a given aerial vehicle and enable it to perform safe and reliable gras**. These components include a custom collision tolerant cage and low-cost Gripper Extension Package, which… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: ©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 6289-6296

  46. arXiv:2307.11342  [pdf, other

    cs.CV

    Tuning Pre-trained Model via Moment Probing

    Authors: Mingze Gao, Qilong Wang, Zhenyi Lin, Pengfei Zhu, Qinghua Hu, **gbo Zhou

    Abstract: Recently, efficient fine-tuning of large-scale pre-trained models has attracted increasing research interests, where linear probing (LP) as a fundamental module is involved in exploiting the final representations for task-dependent classification. However, most of the existing methods focus on how to effectively introduce a few of learnable parameters, and little work pays attention to the commonl… ▽ More

    Submitted 2 October, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023; Project Page: https://github.com/mingzeG/Moment-Probing

  47. arXiv:2305.18326  [pdf, other

    cs.CV cs.AI

    BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation

    Authors: Liyan Kang, Luyang Huang, Ningxin Peng, Peihao Zhu, Zewei Sun, Shanbo Cheng, Mingxuan Wang, Degen Huang, **song Su

    Abstract: We present a large-scale video subtitle translation dataset, BigVideo, to facilitate the study of multi-modality machine translation. Compared with the widely used How2 and VaTeX datasets, BigVideo is more than 10 times larger, consisting of 4.5 million sentence pairs and 9,981 hours of videos. We also introduce two deliberately designed test sets to verify the necessity of visual information: Amb… ▽ More

    Submitted 3 July, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 Findings

  48. arXiv:2305.15985  [pdf, other

    cs.IT eess.SP

    Resource Allocation in Cell-Free MU-MIMO Multicarrier System with Finite and Infinite Blocklength

    Authors: Jiafei Fu, Pengcheng Zhu, Bo Ai, Jiangzhou Wang, Xiaohu You

    Abstract: The explosive growth of data results in more scarce spectrum resources. It is important to optimize the system performance under limited resources. In this paper, we investigate how to achieve weighted throughput (WTP) maximization for cell-free (CF) multiuser MIMO (MU-MIMO) multicarrier (MC) systems through resource allocation (RA), in the cases of finite blocklength (FBL) and infinite blocklengt… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  49. arXiv:2305.15935  [pdf, ps, other

    cs.IT eess.SP

    Grou** Method for mmWave Massive MIMO System: Exploitation of Angular Multiplexing Gain

    Authors: Peng Jiang, Pengcheng Zhu, Jiamin Li, Dongming Wang

    Abstract: A future millimeter-wave (mmWave) massive multiple-input and multiple-output (MIMO) system may serve hundreds or thousands of users at the same time; thus, research on multiple access technology is particularly important.Moreover, due to the short-wavelength nature of a mmWave, large-scale arrays are easier to implement than microwaves, while their directivity and sparseness make the physical beam… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 12 pages,16 figures

  50. arXiv:2305.15926  [pdf, ps, other

    cs.IT

    Joint Precoding Design and Resource Allocation for C-RAN Wireless Fronthaul Systems

    Authors: Peng Jiang, Jiafei Fu, Pengcheng Zhu, Jiamin Li, Xiaohu You

    Abstract: This paper investigates the resource allocation problem combined with fronthaul precoding and access link sparse precoding design in cloud radio access network (C-RAN) wireless fronthaul systems.Multiple remote antenna units (RAUs) in C-RAN systems can collaborate in a cluster through centralized signal processing to realize distributed massive multiple-input and multiple-output (MIMO) systems and… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 5 pages, 2 figures