Skip to main content

Showing 151–200 of 1,143 results for author: Xie, L

.
  1. arXiv:2310.18607  [pdf, other

    physics.ins-det hep-ex

    Simulation Study of Photon-to-Digital Converter (PDC) Timing Specifications for LoLX Experiment

    Authors: Nguyen V. H. Viet, Alaa Al Masri, Masaharu Nomachi, Marc-Andre Tétrault, Soud Al Kharusi, Thomas Brunner, Christopher Chambers, Bindiya Chana, Austin de St. Croix, Eamon Egan, Marco Francesconi, David Gallacher, Luca Galli, Pietro Giampa, Damian Goeldi, Jessee Lefebvre, Chloe Malbrunot, Peter Margetak, Juliette Martin, Thomas McElroy, Mayur Patel, Bernadette Rebeiro, Fabrice Retiere, El Mehdi Rtimi, Lisa Rudolph , et al. (2 additional authors not shown)

    Abstract: The Light only Liquid Xenon (LoLX) experiment is a prototype detector aimed to study liquid xenon (LXe) light properties and various photodetection technologies. LoLX is also aimed to quantify LXe's time resolution as a potential scintillator for 10~ps time-of-flight (TOF) PET. Another key goal of LoLX is to perform a time-based separation of Cerenkov and scintillation photons for new background r… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: 5 pages, 7 figures

  2. arXiv:2310.17101  [pdf, other

    eess.AS cs.SD

    Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning

    Authors: Xinfa Zhu, Yuke Li, Yi Lei, Ning Jiang, Guoqing Zhao, Lei Xie

    Abstract: This paper aims to build a multi-speaker expressive TTS system, synthesizing a target speaker's speech with multiple styles and emotions. To this end, we propose a novel contrastive learning-based TTS approach to transfer style and emotion across speakers. Specifically, contrastive learning from different levels, i.e. utterance and category level, is leveraged to extract the disentangled style, em… ▽ More

    Submitted 25 April, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 6 pages, 4 figures; Accepted by ICME 2024

  3. arXiv:2310.16254  [pdf, ps, other

    math.FA

    Directional Differentiability of the Generalized Metric Projection in Hilbert spaces and Hilbertian Bochner spaces

    Authors: **lu Li, Li Cheng, Lishan Liu, Linsen Xie

    Abstract: Let $H$ be a real Hilbert space and $C$ a nonempty closed and convex subset of $H$. Let $P_C: H\rightarrow C$ denote the (standard) metric projection operator. In this paper, we study the Gâteaux directional differentiability of $P_C$ and investigate some of its properties. The Gâteaux directionally derivatives of $P_C$ are precisely given for the following cases of the considered subset $C$: 1. c… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: This article has been accepted for publication

    MSC Class: 49J50; 26A24; 47A58; 47J30; 49J40

  4. arXiv:2310.15291  [pdf, other

    nucl-ex

    Nuclear charge radius of $^{26m}$Al and its implication for V$_{ud}$ in the quark-mixing matrix

    Authors: P. Plattner, E. Wood, L. Al Ayoubi, O. Beliuskina, M. L. Bissell, K. Blaum, P. Campbell, B. Cheal, R. P. de Groote, C. S. Devlin, T. Eronen, L. Filippin, R. F. García Ruíz, Z. Ge, S. Geldhof, W. Gins, M. Godefroid, H. Heylen, M. Hukkanen, P. Imgram, A. Jaries, A. Jokinen, A. Kanellakopoulos, A. Kankainen, S. Kaufmann , et al. (28 additional authors not shown)

    Abstract: Collinear laser spectroscopy was performed on the isomer of the aluminium isotope $^{26m}$Al. The measured isotope shift to $^{27}$Al in the $3s^{2}3p\;^{2}\!P^\circ_{3/2} \rightarrow 3s^{2}4s\;^{2}\!S_{1/2}$ atomic transition enabled the first experimental determination of the nuclear charge radius of $^{26m}$Al, resulting in $R_c$=\qty{3.130\pm.015}{\femto\meter}. This differs by 4.5 standard de… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 5 pages, 2 figures, submitted to Phys. Rev. Lett

  5. arXiv:2310.14278  [pdf, other

    cs.SD cs.CL eess.AS

    Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

    Authors: Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie

    Abstract: Automatic Speech Recognition (ASR) in conversational settings presents unique challenges, including extracting relevant contextual information from previous conversational turns. Due to irrelevant content, error propagation, and redundancy, existing methods struggle to extract longer and more effective contexts. To address this issue, we introduce a novel conversational ASR system, extending the C… ▽ More

    Submitted 27 April, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: TASLP

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024

  6. arXiv:2310.13425  [pdf, other

    math.OC

    An overview of optimization approaches for scheduling and rostering resources in public transportation

    Authors: Lucas Mertens, Lena-Antonia Wolbeck, David Rößler, Lin Xie, Natalia Kliewer

    Abstract: Public transport is vital for meeting people's mobility needs. Providers need to plan their services well to offer high quality and low cost. Optimized planning can benefit providers, customers, and municipalities. The planning process for public transport involves various decision problems, such as vehicle and crew planning. These problems are usually solved by providers. More and more studies su… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  7. arXiv:2310.08529  [pdf, other

    cs.CV cs.GR

    GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

    Authors: Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, Xinggang Wang

    Abstract: In recent times, the generation of 3D assets from text prompts has shown impressive results. Both 2D and 3D diffusion models can help generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. 2D diffusion models enjoy strong abilities of generalization and fine ge… ▽ More

    Submitted 13 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: CVPR 2024, Project page: https://taoranyi.com/gaussiandreamer/

  8. arXiv:2310.08528  [pdf, other

    cs.CV cs.GR

    4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

    Authors: Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, Xinggang Wang

    Abstract: Representing and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to guarantee. To achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency, we propose 4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes rather than applying 3D-GS… ▽ More

    Submitted 7 December, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Project page: https://guanjunwu.github.io/4dgs/

  9. arXiv:2310.07246  [pdf, other

    cs.SD eess.AS

    Vec-Tok Speech: speech vectorization and tokenization for neural speech generation

    Authors: Xinfa Zhu, Yuanjun Lv, Yi Lei, Tao Li, Wendi He, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Language models (LMs) have recently flourished in natural language processing and computer vision, generating high-fidelity texts or images in various tasks. In contrast, the current speech generative models are still struggling regarding speech quality and task generalization. This paper presents Vec-Tok Speech, an extensible framework that resembles multiple speech generation tasks, generating e… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 15 pages, 2 figures

  10. arXiv:2310.06854  [pdf, other

    cs.CV cs.AI eess.IV

    Learning with Noisy Labels for Human Fall Events Classification: Joint Cooperative Training with Trinity Networks

    Authors: Leiyu Xie, Yang Sun, Syed Mohsen Naqvi

    Abstract: With the increasing ageing population, fall events classification has drawn much research attention. In the development of deep learning, the quality of data labels is crucial. Most of the datasets are labelled automatically or semi-automatically, and the samples may be mislabeled, which constrains the performance of Deep Neural Networks (DNNs). Recent research on noisy label learning confirms tha… ▽ More

    Submitted 27 September, 2023; originally announced October 2023.

  11. arXiv:2310.05051  [pdf, other

    cs.SD eess.AS

    SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

    Authors: Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Speaker anonymization aims to conceal a speaker's identity without degrading speech quality and intelligibility. Most speaker anonymization systems disentangle the speaker representation from the original speech and achieve anonymization by averaging or modifying the speaker representation. However, the anonymized speech is subject to reduction in pseudo speaker distinctiveness, speech quality and… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: 8 pages, 3 figures; Accepted by ASRU2023

  12. arXiv:2310.05001  [pdf, other

    cs.SD eess.AS

    PromptSpeaker: Speaker Generation Based on Text Descriptions

    Authors: Yongmao Zhang, Guanghou Liu, Yi Lei, Yunlin Chen, Hao Yin, Lei Xie, Zhifei Li

    Abstract: Recently, text-guided content generation has received extensive attention. In this work, we explore the possibility of text description-based speaker generation, i.e., using text prompts to control the speaker generation process. Specifically, we propose PromptSpeaker, a text-guided speaker generation system. PromptSpeaker consists of a prompt encoder, a zero-shot VITS, and a Glow model, where the… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to ASRU 2023

  13. arXiv:2310.04863  [pdf, other

    cs.SD eess.AS

    SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR

    Authors: Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie

    Abstract: Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising results in speaker-attributed automatic speech recognition (SA-ASR).Although being able to obtain state-of-the-art (SOTA) performance, most of the studies are based on an autoregressive (AR) decoder which generates tokens one-by-one and results in a large real-time factor (RTF). To speed up inference, we intro… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  14. arXiv:2310.04715  [pdf, other

    eess.AS cs.SD

    An Exploration of Task-decoupling on Two-stage Neural Post Filter for Real-time Personalized Acoustic Echo Cancellation

    Authors: Zihan Zhang, Jiayao Sun, Xianjun Xia, Ziqian Wang, Xiaopeng Yan, Yijian Xiao, Lei Xie

    Abstract: Deep learning based techniques have been popularly adopted in acoustic echo cancellation (AEC). Utilization of speaker representation has extended the frontier of AEC, thus attracting many researchers' interest in personalized acoustic echo cancellation (PAEC). Meanwhile, task-decoupling strategies are widely adopted in speech enhancement. To further explore the task-decoupling approach, we propos… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: accepted to ASRU 2023

  15. arXiv:2310.04657  [pdf, other

    eess.AS cs.SD

    Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

    Authors: Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie

    Abstract: The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control t… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  16. arXiv:2310.04369  [pdf, other

    cs.SD cs.LG eess.AS

    MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement

    Authors: Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie

    Abstract: A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-freque… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  17. arXiv:2310.04004  [pdf, other

    cs.SD eess.AS

    U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning

    Authors: Tao Li, Zhichao Wang, Xinfa Zhu, Jian Cong, Qiao Tian, Yu** Wang, Lei Xie

    Abstract: Zero-shot speaker cloning aims to synthesize speech for any target speaker unseen during TTS system building, given only a single speech reference of the speaker at hand. Although more practical in real applications, the current zero-shot methods still produce speech with undesirable naturalness and speaker similarity. Moreover, endowing the target speaker with arbitrary speaking styles in the zer… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  18. arXiv:2310.03963  [pdf, other

    cs.SD eess.AS

    Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis

    Authors: Yuke Li, Xinfa Zhu, Yi Lei, Hai Li, Junhui Liu, Danming Xie, Lei Xie

    Abstract: Zero-shot emotion transfer in cross-lingual speech synthesis aims to transfer emotion from an arbitrary speech reference in the source language to the synthetic speech in the target language. Building such a system faces challenges of unnatural foreign accents and difficulty in modeling the shared emotional expressions of different languages. Building on the DelightfulTTS neural architecture, this… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  19. arXiv:2310.03309  [pdf, other

    cs.CL cs.AI

    Concise and Organized Perception Facilitates Reasoning in Large Language Models

    Authors: Junjie Liu, Shaotian Yan, Chen Shen, Liang Xie, Wenxiao Wang, Jie** Ye

    Abstract: Exploiting large language models (LLMs) to tackle reasoning has garnered growing attention. It still remains highly challenging to achieve satisfactory results in complex logical problems, characterized by plenty of premises within the prompt and requiring multi-hop reasoning. In particular, the reasoning capabilities of LLMs are brittle to disorder and distractibility. In this work, we first exam… ▽ More

    Submitted 13 June, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: 26 pages

  20. arXiv:2310.02802  [pdf, other

    eess.AS

    VITS-Based Singing Voice Conversion Leveraging Whisper and multi-scale F0 Modeling

    Authors: Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie

    Abstract: This paper introduces the T23 team's system submitted to the Singing Voice Conversion Challenge 2023. Following the recognition-synthesis framework, our singing conversion model is based on VITS, incorporating four key modules: a prior encoder, a posterior encoder, a decoder, and a parallel bank of transposed convolutions (PBTC) module. We particularly leverage Whisper, a powerful pre-trained ASR… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  21. arXiv:2310.02629  [pdf, other

    cs.SD eess.AS

    BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

    Authors: Peikun Chen, Fan Yu, Yuhao Lian, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

    Abstract: Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these dr… ▽ More

    Submitted 7 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  22. arXiv:2309.16964  [pdf, other

    cs.CV

    AdaPose: Towards Cross-Site Device-Free Human Pose Estimation with Commodity WiFi

    Authors: Yunjiao Zhou, Jianfei Yang, He Huang, Lihua Xie

    Abstract: WiFi-based pose estimation is a technology with great potential for the development of smart homes and metaverse avatar generation. However, current WiFi-based pose estimation methods are predominantly evaluated under controlled laboratory conditions with sophisticated vision models to acquire accurately labeled data. Furthermore, WiFi CSI is highly sensitive to environmental variables, and direct… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  23. arXiv:2309.16937  [pdf, other

    cs.CL cs.SD eess.AS

    SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition

    Authors: Hongfei Xue, Qijie Shao, Kaixun Huang, Peikun Chen, Jie Liu, Lei Xie

    Abstract: Multilingual automatic speech recognition (ASR) systems have garnered attention for their potential to extend language coverage globally. While self-supervised learning (SSL) models, like MMS, have demonstrated their effectiveness in multilingual ASR, it is worth noting that various layers' representations potentially contain distinct information that has not been fully leveraged. In this study, w… ▽ More

    Submitted 27 April, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: 5 pages, 2 figures. Accepted by ICME 2024

  24. arXiv:2309.16171  [pdf, other

    math.ST

    Distributionally Robust Quickest Change Detection using Wasserstein Uncertainty Sets

    Authors: Liyan Xie, Yuchen Liang, Venugopal V. Veeravalli

    Abstract: The problem of quickest detection of a change in the distribution of a sequence of independent observations is considered. It is assumed that the pre-change distribution is known (accurately estimated), while the only information about the post-change distribution is through a (small) set of labeled data. This post-change data is used in a data-driven minimax robust framework, where an uncertainty… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  25. arXiv:2309.15823  [pdf, ps, other

    math.AG math.DS

    Minimal model program for algebraically integrable foliations and generalized pairs

    Authors: Guodu Chen, **gjun Han, Jihao Liu, Lingyao Xie

    Abstract: By systematically introducing and studying the structure of algebraically integrable generalized foliated quadruples, we establish the minimal model program for $\mathbb Q$-factorial foliated dlt algebraically integrable foliations and lc generalized pairs by proving their cone theorems, contraction theorems, and the existence of flips. We also provide numerous applications on their birational geo… ▽ More

    Submitted 28 September, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: 137 pages. Minor change: remove a redundant paragraph in introduction

    MSC Class: 14E30; 37F75

  26. arXiv:2309.15635  [pdf, other

    cs.CV

    Position and Orientation-Aware One-Shot Learning for Medical Action Recognition from Signal Data

    Authors: Leiyu Xie, Yuxing Yang, Zeyu Fu, Syed Mohsen Naqvi

    Abstract: In this work, we propose a position and orientation-aware one-shot learning framework for medical action recognition from signal data. The proposed framework comprises two stages and each stage includes signal-level image generation (SIG), cross-attention (CsA), dynamic time war** (DTW) modules and the information fusion between the proposed privacy-preserved position and orientation features. T… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  27. arXiv:2309.15496  [pdf, other

    eess.AS cs.SD

    DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

    Authors: Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Voice conversion is becoming increasingly popular, and a growing number of application scenarios require models with streaming inference capabilities. The recently proposed DualVC attempts to achieve this objective through streaming model architecture design and intra-model knowledge distillation along with hybrid predictive coding to compensate for the lack of future information. However, DualVC… ▽ More

    Submitted 18 January, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP2024

  28. arXiv:2309.15458  [pdf, other

    cs.AI cs.SC

    LogicMP: A Neuro-symbolic Approach for Encoding First-order Logic Constraints

    Authors: Weidi Xu, **gwei Wang, Lele Xie, Jianshan He, Hongting Zhou, Taifeng Wang, Xiaopei Wan, **gdong Chen, Chao Qu, Wei Chu

    Abstract: Integrating first-order logic constraints (FOLCs) with neural networks is a crucial but challenging problem since it involves modeling intricate correlations to satisfy the constraints. This paper proposes a novel neural layer, LogicMP, whose layers perform mean-field variational inference over an MLN. It can be plugged into any off-the-shelf neural network to encode FOLCs while retaining modulari… ▽ More

    Submitted 16 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: 28 pages, 14 figures, 12 tables

  29. arXiv:2309.14717  [pdf, other

    cs.LG cs.CL

    QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

    Authors: Yuhui Xu, Lingxi Xie, Xiaotao Gu, Xin Chen, Heng Chang, Hengheng Zhang, Zhengsu Chen, Xiaopeng Zhang, Qi Tian

    Abstract: Recently years have witnessed a rapid development of large language models (LLMs). Despite the strong ability in many language-understanding tasks, the heavy computational burden largely restricts the application of LLMs especially when one needs to deploy them onto edge devices. In this paper, we propose a quantization-aware low-rank adaptation (QA-LoRA) algorithm. The motivation lies in the imba… ▽ More

    Submitted 9 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: 16 pages

  30. arXiv:2309.13907  [pdf, other

    cs.SD eess.AS

    HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS

    Authors: Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie

    Abstract: Recent advances in text-to-speech, particularly those based on Graph Neural Networks (GNNs), have significantly improved the expressiveness of short-form synthetic speech. However, generating human-parity long-form speech with high dynamic prosodic variations is still challenging. To address this problem, we expand the capabilities of GNNs with a hierarchical prosody modeling approach, named HiGNN… ▽ More

    Submitted 6 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted by ASRU2023

  31. arXiv:2309.13573  [pdf, other

    cs.SD eess.AS

    The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

    Authors: Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu

    Abstract: With the success of the first Multi-channel Multi-party Meeting Transcription challenge (M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to tackle the complex task of \emph{speaker-attributed ASR (SA-ASR)}, which directly addresses the practical and challenging problem of ``who spoke what at when" at typical meeting scenario. We particularly established two sub-tr… ▽ More

    Submitted 5 October, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: 8 pages, Accepted by ASRU2023

  32. arXiv:2309.13035  [pdf, other

    cs.RO

    PyPose v0.6: The Imperative Programming Interface for Robotics

    Authors: Zitong Zhan, Xiangfu Li, Qihang Li, Haonan He, Abhinav Pandey, Haitao Xiao, Yangmengfei Xu, Xiangyu Chen, Kuan Xu, Kun Cao, Zhipeng Zhao, Zihan Wang, Huan Xu, Zihang Fang, Yutian Chen, Wentao Wang, Xu Fang, Yi Du, Tianhao Wu, Xiao Lin, Yuheng Qiu, Fan Yang, **gnan Shi, Shaoshu Su, Yiren Lu , et al. (11 additional authors not shown)

    Abstract: PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, inco… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  33. arXiv:2309.12577  [pdf, ps, other

    math.OC

    Distributed Optimal Control and Application to Consensus of Multi-Agent Systems

    Authors: Li** Zhang, Juanjuan Xu, Huanshui Zhang, Lihua Xie

    Abstract: This paper develops a novel approach to the consensus problem of multi-agent systems by minimizing a weighted state error with neighbor agents via linear quadratic (LQ) optimal control theory. Existing consensus control algorithms only utilize the current state of each agent, and the design of distributed controller depends on nonzero eigenvalues of the communication topology. The presented optima… ▽ More

    Submitted 16 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

  34. arXiv:2309.11839  [pdf, other

    cs.CV cs.RO

    MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation

    Authors: Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Shenghai Yuan, Lihua Xie

    Abstract: Multi-modal unsupervised domain adaptation (MM-UDA) for 3D semantic segmentation is a practical solution to embed semantic understanding in autonomous systems without expensive point-wise annotations. While previous MM-UDA methods can achieve overall improvement, they suffer from significant class-imbalanced performance, restricting their adoption in real applications. This imbalanced performance… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  35. Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective

    Authors: Laixin Xie, Yang Ouyang, Longfei Chen, Ziming Wu, Quan Li

    Abstract: Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 18 pages, 11 figures. This paper is accepted by IEEE Transactions on Visualization and Computer Graphics (TVCG)

    ACM Class: I.1.2; H.1.2; H.4.2

  36. arXiv:2309.09262  [pdf, other

    eess.AS cs.SD

    PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

    Authors: Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, **g**g Yin, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Style voice conversion aims to transform the style of source speech to a desired style according to real-world application demands. However, the current style voice conversion approach relies on pre-defined labels or reference speech to control the conversion process, which leads to limitations in style diversity or falls short in terms of the intuitive and interpretability of style representation… ▽ More

    Submitted 26 December, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  37. arXiv:2309.08914  [pdf, other

    cs.RO

    Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning

    Authors: Pengyu Yin, Haozhi Cao, Thien-Minh Nguyen, Shenghai Yuan, Shuyang Zhang, Kangcheng Liu, Lihua Xie

    Abstract: One-shot LiDAR localization refers to the ability to estimate the robot pose from one single point cloud, which yields significant advantages in initialization and relocalization processes. In the point cloud domain, the topic has been extensively studied as a global descriptor retrieval (i.e., loop closure detection) and pose refinement (i.e., point cloud registration) problem both in isolation o… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: 8 pages, 5 figures

  38. arXiv:2309.06036  [pdf, other

    eess.SP

    Which Framework is Suitable for Online 3D Multi-Object Tracking for Autonomous Driving with Automotive 4D Imaging Radar?

    Authors: Jianan Liu, Guanhua Ding, Yuxuan Xia, **** Sun, Tao Huang, Lihua Xie, Bing Zhu

    Abstract: Online 3D multi-object tracking (MOT) has recently received significant research interests due to the expanding demand of 3D perception in advanced driver assistance systems (ADAS) and autonomous driving (AD). Among the existing 3D MOT frameworks for ADAS and AD, conventional point object tracking (POT) framework using the tracking-by-detection (TBD) strategy has been well studied and accepted for… ▽ More

    Submitted 25 May, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 8 pages, 5 figures, accepted by IEEE 35th Intelligent Vehicles Symposium (IV 2024), oral presentation (top 5%), code is available at https://github.com/dinggh0817/4D_Radar_MOT

  39. arXiv:2309.05305  [pdf, other

    cs.LG

    Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data

    Authors: Yucheng Wang, Yuecong Xu, Jianfei Yang, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen

    Abstract: Multivariate Time-Series (MTS) data is crucial in various application fields. With its sequential and multi-source (multiple sensors) properties, MTS data inherently exhibits Spatial-Temporal (ST) dependencies, involving temporal correlations between timestamps and spatial correlations between sensors in each timestamp. To effectively leverage this information, Graph Neural Network-based methods (… ▽ More

    Submitted 10 January, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: 10 pages, 8 figures, Accepted by AAAI 2024

  40. arXiv:2309.05202  [pdf, other

    cs.LG

    Graph-Aware Contrasting for Multivariate Time-Series Classification

    Authors: Yucheng Wang, Yuecong Xu, Jianfei Yang, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen

    Abstract: Contrastive learning, as a self-supervised learning paradigm, becomes popular for Multivariate Time-Series (MTS) classification. It ensures the consistency across different views of unlabeled samples and then learns effective representations for these samples. Existing contrastive learning methods mainly focus on achieving temporal consistency with temporal augmentation and contrasting techniques,… ▽ More

    Submitted 10 January, 2024; v1 submitted 10 September, 2023; originally announced September 2023.

    Comments: 10 pages, 7 figures, Accepted by AAAI 2024

  41. arXiv:2309.01770  [pdf, other

    cs.CV

    StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation

    Authors: Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wen** Wang, ** Luo

    Abstract: This paper presents a LoRA-free method for stylized image generation that takes a text prompt and style reference images as inputs and produces an output image in a single pass. Unlike existing methods that rely on training a separate LoRA for each style, our method can adapt to various styles with a unified model. However, this poses two challenges: 1) the prompt loses controllability over the ge… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: AIGC

  42. arXiv:2309.01142  [pdf, other

    eess.AS cs.SD

    MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

    Authors: Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yu** Wang

    Abstract: In addition to conveying the linguistic content from source speech to converted speech, maintaining the speaking style of source speech also plays an important role in the voice conversion (VC) task, which is essential in many scenarios with highly expressive source speech, such as dubbing and data augmentation. Previous work generally took explicit prosodic features or fixed-length style embeddin… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: This work was submitted on April 10, 2022 and accepted on August 29, 2023

  43. arXiv:2309.00929  [pdf, other

    cs.SD eess.AS

    Timbre-reserved Adversarial Attack in Speaker Identification

    Authors: Qing Wang, Jixun Yao, Li Zhang, Pengcheng Guo, Lei Xie

    Abstract: As a type of biometric identification, a speaker identification (SID) system is confronted with various kinds of attacks. The spoofing attacks typically imitate the timbre of the target speakers, while the adversarial attacks confuse the SID system by adding a well-designed adversarial perturbation to an arbitrary speech. Although the spoofing attack copies a similar timbre as the victim, it does… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: 11 pages, 8 figures

  44. arXiv:2309.00883  [pdf, other

    cs.SD eess.AS

    DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin

    Authors: Tao Li, Chenxu Hu, Jian Cong, Xinfa Zhu, **gbei Li, Qiao Tian, Yu** Wang, Lei Xie

    Abstract: While the performance of cross-lingual TTS based on monolingual corpora has been significantly improved recently, generating cross-lingual speech still suffers from the foreign accent problem, leading to limited naturalness. Besides, current cross-lingual methods ignore modeling emotion, which is indispensable paralinguistic information in speech delivery. In this paper, we propose DiCLET-TTS, a D… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: accepted by TASLP

  45. arXiv:2308.16522  [pdf, other

    cs.RO

    Graph-based SLAM-Aware Exploration with Prior Topo-Metric Information

    Authors: Ruofei Bai, Hongliang Guo, Wei-Yun Yau, Lihua Xie

    Abstract: Autonomous exploration requires a robot to explore an unknown environment while constructing an accurate map using Simultaneous Localization and Map** (SLAM) techniques. Without prior information, the exploration performance is usually conservative due to the limited planning horizon. This paper exploits prior information about the environment, represented as a topo-metric graph, to benefit both… ▽ More

    Submitted 30 June, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: 8 pages, 6 figures. Accepted by IEEE RA-L for publication

  46. arXiv:2308.16053  [pdf, other

    cs.HC cs.DL

    OldVisOnline: Curating a Dataset of Historical Visualizations

    Authors: Yu Zhang, Ruike Jiang, Liwenhan Xie, Yuheng Zhao, Can Liu, Tianhong Ding, Siming Chen, Xiaoru Yuan

    Abstract: With the increasing adoption of digitization, more and more historical visualizations created hundreds of years ago are accessible in digital libraries online. It provides a unique opportunity for visualization and history research. Meanwhile, there is no large-scale digital collection dedicated to historical visualizations. The visualizations are scattered in various collections, which hinders re… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Accepted to IEEE VIS 2023

  47. arXiv:2308.12617  [pdf, ps, other

    eess.SY cs.MA

    Quantized distributed Nash equilibrium seeking under DoS attacks: A quantized consensus based approach

    Authors: Shuai Feng, Maojiao Ye, Lihua Xie, Shengyuan Xu

    Abstract: This paper studies distributed Nash equilibrium (NE) seeking under Denial-of-Service (DoS) attacks and quantization. The players can only exchange information with their own direct neighbors. The transmitted information is subject to quantization and packet losses induced by malicious DoS attacks. We propose a quantized distributed NE seeking strategy based on the approach of dynamic quantized con… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  48. arXiv:2308.12587  [pdf, other

    cs.CV

    Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation

    Authors: Yibo Cui, Liang Xie, Yakun Zhang, Meishan Zhang, Ye Yan, Erwei Yin

    Abstract: Cross-modal alignment is one key challenge for Vision-and-Language Navigation (VLN). Most existing studies concentrate on map** the global instruction or single sub-instruction to the corresponding trajectory. However, another critical problem of achieving fine-grained alignment at the entity level is seldom considered. To address this problem, we propose a novel Grounded Entity-Landmark Adaptiv… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 Oral

  49. arXiv:2308.11217  [pdf, other

    cs.LG cs.AI

    Federated Learning in Big Model Era: Domain-Specific Multimodal Large Models

    Authors: Zengxiang Li, Zhaoxiang Hou, Hui Liu, Ying Wang, Tongzhi Li, Longfei Xie, Chao Shi, Chengyi Yang, Weishan Zhang, Zelei Liu, Liang Xu

    Abstract: Multimodal data, which can comprehensively perceive and recognize the physical world, has become an essential path towards general artificial intelligence. However, multimodal large models trained on public datasets often underperform in specific industrial domains. This paper proposes a multimodal federated learning framework that enables multiple enterprises to utilize private domain data to col… ▽ More

    Submitted 24 August, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

  50. arXiv:2308.09259  [pdf, other

    cs.LG

    FRGNN: Mitigating the Impact of Distribution Shift on Graph Neural Networks via Test-Time Feature Reconstruction

    Authors: Rui Ding, Jielong Yang, Feng Ji, Xionghu Zhong, Linbo Xie

    Abstract: Due to inappropriate sample selection and limited training data, a distribution shift often exists between the training and test sets. This shift can adversely affect the test performance of Graph Neural Networks (GNNs). Existing approaches mitigate this issue by either enhancing the robustness of GNNs to distribution shift or reducing the shift itself. However, both approaches necessitate retrain… ▽ More

    Submitted 13 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.