Skip to main content

Showing 1–32 of 32 results for author: Jiang, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.18621  [pdf, other

    cs.IT eess.SP

    Performance Analysis of Integrated Sensing and Communication Networks with Blockage Effects

    Authors: Zezhong Sun, Shi Yan, Ning Jiang, Jiaen Zhou, Mugen Peng

    Abstract: Communication-sensing integration represents an up-and-coming area of research, enabling wireless networks to simultaneously perform communication and sensing tasks. However, in urban cellular networks, the blockage of buildings results in a complex signal propagation environment, affecting the performance analysis of integrated sensing and communication (ISAC) networks. To overcome this obstacle,… ▽ More

    Submitted 15 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE Transactions on Vehicular Technology

  2. arXiv:2403.09536  [pdf

    eess.SY

    Mixed Algorithm of SINDy and HAVOK for Measure-Based Analysis of Power System with Inverter-based Resources

    Authors: Reza Saeed Kandezy, John Ning Jiang

    Abstract: Artificial intelligence and machine learning is enhancing electric grids by offering data analysis tools that can be used to operate the power grid more reliably. However, the complex nonlinear dynamics, particularly when coupled with multi-scale interactions among Inverter-based renewable energy Resources, calls for effective algorithms for power system application. This paper presents affective… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  3. arXiv:2401.03697  [pdf, other

    cs.SD eess.AS

    An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge

    Authors: Runduo Han, Xiaopeng Yan, Weiming Xu, Pengcheng Guo, Jiayao Sun, He Wang, Quan Lu, Ning Jiang, Lei Xie

    Abstract: This paper describes our audio-quality-based multi-strategy approach for the audio-visual target speaker extraction (AVTSE) task in the Multi-modal Information based Speech Processing (MISP) 2023 Challenge. Specifically, our approach adopts different extraction strategies based on the audio quality, striking a balance between interference removal and speech preservation, which benifits the back-en… ▽ More

    Submitted 6 March, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  4. arXiv:2312.09747  [pdf, other

    eess.AS eess.SP

    SELM: Speech Enhancement Using Discrete Tokens and Language Models

    Authors: Ziqian Wang, Xinfa Zhu, Zihan Zhang, YuanJun Lv, Ning Jiang, Guoqing Zhao, Lei Xie

    Abstract: Language models (LMs) have shown superior performances in various speech generation tasks recently, demonstrating their powerful ability for semantic context modeling. Given the intrinsic similarity between speech generation and speech enhancement, harnessing semantic information holds potential advantages for speech enhancement tasks. In light of this, we propose SELM, a novel paradigm for speech… ▽ More

    Submitted 7 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  5. arXiv:2310.17101  [pdf, other

    eess.AS cs.SD

    Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning

    Authors: Xinfa Zhu, Yuke Li, Yi Lei, Ning Jiang, Guoqing Zhao, Lei Xie

    Abstract: This paper aims to build a multi-speaker expressive TTS system, synthesizing a target speaker's speech with multiple styles and emotions. To this end, we propose a novel contrastive learning-based TTS approach to transfer style and emotion across speakers. Specifically, contrastive learning from different levels, i.e. utterance and category level, is leveraged to extract the disentangled style, em… ▽ More

    Submitted 25 April, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 6 pages, 4 figures; Accepted by ICME 2024

  6. arXiv:2310.14278  [pdf, other

    cs.SD cs.CL eess.AS

    Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

    Authors: Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie

    Abstract: Automatic Speech Recognition (ASR) in conversational settings presents unique challenges, including extracting relevant contextual information from previous conversational turns. Due to irrelevant content, error propagation, and redundancy, existing methods struggle to extract longer and more effective contexts. To address this issue, we introduce a novel conversational ASR system, extending the C… ▽ More

    Submitted 27 April, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: TASLP

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024

  7. arXiv:2310.04760  [pdf, other

    eess.AS cs.SD

    Multi-objective Progressive Clustering for Semi-supervised Domain Adaptation in Speaker Verification

    Authors: Ze Li, Yuke Lin, Ning Jiang, Xiaoyi Qin, Guoqing Zhao, Haiying Wu, Ming Li

    Abstract: Utilizing the pseudo-labeling algorithm with large-scale unlabeled data becomes crucial for semi-supervised domain adaptation in speaker verification tasks. In this paper, we propose a novel pseudo-labeling method named Multi-objective Progressive Clustering (MoPC), specifically designed for semi-supervised domain adaptation. Firstly, we utilize limited labeled data from the target domain to deriv… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  8. Driving behavior-guided battery health monitoring for electric vehicles using machine learning

    Authors: Nanhua Jiang, Jiawei Zhang, Weiran Jiang, Yao Ren, **g Lin, Edwin Khoo, Ziyou Song

    Abstract: An accurate estimation of the state of health (SOH) of batteries is critical to ensuring the safe and reliable operation of electric vehicles (EVs). Feature-based machine learning methods have exhibited enormous potential for rapidly and precisely monitoring battery health status. However, simultaneously using various health indicators (HIs) may weaken estimation performance due to feature redunda… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Journal ref: Applied Energy (2024)

  9. arXiv:2309.14109  [pdf, other

    eess.AS cs.SD

    Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker Verification

    Authors: Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, Ming Li

    Abstract: It is widely acknowledged that discriminative representation for speaker verification can be extracted from verbal speech. However, how much speaker information that non-verbal vocalization carries is still a puzzle. This paper explores speaker verification based on the most ubiquitous form of non-verbal voice, laughter. First, we use a semi-automatic pipeline to collect a new Haha-Pod dataset fro… ▽ More

    Submitted 9 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: accepted by ASRU 2023

  10. arXiv:2308.08766  [pdf, other

    eess.AS cs.SD

    The DKU-MSXF Speaker Verification System for the VoxCeleb Speaker Recognition Challenge 2023

    Authors: Ze Li, Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, Ming Li

    Abstract: This paper is the system description of the DKU-MSXF System for the track1, track2 and track3 of the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23). For Track 1, we utilize a network structure based on ResNet for training. By constructing a cross-age QMF training set, we achieve a substantial improvement in system performance. For Track 2, we inherite the pre-trained model from Track 1 an… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2210.05092

  11. arXiv:2308.07595  [pdf, other

    eess.AS

    The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 2023

    Authors: Ming Cheng, Weiqing Wang, Xiaoyi Qin, Yuke Lin, Ning Jiang, Guoqing Zhao, Ming Li

    Abstract: This paper describes the DKU-MSXF submission to track 4 of the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23). Our system pipeline contains voice activity detection, clustering-based diarization, overlapped speech detection, and target-speaker voice activity detection, where each procedure has a fused output from 3 sub-models. Finally, we fuse different clustering-based and TSVAD-based di… ▽ More

    Submitted 16 August, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

  12. arXiv:2308.07056  [pdf, other

    eess.AS cs.MM cs.SD

    VoxBlink: A Large Scale Speaker Verification Dataset on Camera

    Authors: Yuke Lin, Xiaoyi Qin, Guoqing Zhao, Ming Cheng, Ning Jiang, Haiyang Wu, Ming Li

    Abstract: In this paper, we introduce a large-scale and high-quality audio-visual speaker verification dataset, named VoxBlink. We propose an innovative and robust automatic audio-visual data mining pipeline to curate this dataset, which contains 1.45M utterances from 38K speakers. Due to the inherent nature of automated data collection, introducing noisy data is inevitable. Therefore, we also utilize a mul… ▽ More

    Submitted 12 December, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted By ICASSP2024

  13. arXiv:2307.04630  [pdf, other

    cs.SD eess.AS

    The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task

    Authors: Kun Song, Yi lei, Peikun Chen, Yiqing Cao, Kun Wei, Yongmao Zhang, Lei Xie, Ning Jiang, Guoqing Zhao

    Abstract: This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-speech translation (S2ST) task which aims to translate from English speech of multi-source to Chinese speech. The system is built in a cascaded manner consisting of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS). We make tremendous efforts to handle the challenging multi-source input. Spec… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: IWSLT@ACL 2023 system paper. Our submitted system ranks 1st in the S2ST task of the IWSLT 2023 evaluation campaign

  14. arXiv:2307.04133  [pdf, other

    eess.IV cs.CV

    Ultrasonic Image's Annotation Removal: A Self-supervised Noise2Noise Approach

    Authors: Yuanheng Zhang, Nan Jiang, Zhaoheng Xie, Junying Cao, Yueyang Teng

    Abstract: Accurately annotated ultrasonic images are vital components of a high-quality medical report. Hospitals often have strict guidelines on the types of annotations that should appear on imaging results. However, manually inspecting these images can be a cumbersome task. While a neural network could potentially automate the process, training such a model typically requires a dataset of paired input an… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: 10 pages, 7 figures

  15. arXiv:2306.05297  [pdf

    eess.IV cs.CV

    Connectional-Style-Guided Contextual Representation Learning for Brain Disease Diagnosis

    Authors: Gongshu Wang, Ning Jiang, Yunxiao Ma, Tiantian Liu, Duanduan Chen, **glong Wu, Guoqi Li, Dong Liang, Tianyi Yan

    Abstract: Structural magnetic resonance imaging (sMRI) has shown great clinical value and has been widely used in deep learning (DL) based computer-aided brain disease diagnosis. Previous approaches focused on local shapes and textures in sMRI that may be significant only within a particular domain. The learned representations are likely to contain spurious information and have a poor generalization ability… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  16. arXiv:2302.11224  [pdf, other

    cs.CL cs.SD eess.AS

    MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition

    Authors: Jiaming Zhou, Shiwan Zhao, Ning Jiang, Guoqing Zhao, Yong Qin

    Abstract: End-to-end automatic speech recognition (ASR) usually suffers from performance degradation when applied to a new domain due to domain shift. Unsupervised domain adaptation (UDA) aims to improve the performance on the unlabeled target domain by transferring knowledge from the source to the target domain. To improve transferability, existing UDA approaches mainly focus on matching the distributions… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: Accepted to ICASSP 2023

  17. Information Bottleneck-Inspired Type Based Multiple Access for Remote Estimation in IoT Systems

    Authors: Meiyi Zhu, Chunyan Feng, Caili Guo, Nan Jiang, Osvaldo Simeone

    Abstract: Type-based multiple access (TBMA) is a semantics-aware multiple access protocol for remote inference. In TBMA, codewords are reused across transmitting sensors, with each codeword being assigned to a different observation value. Existing TBMA protocols are based on fixed shared codebooks and on conventional maximum-likelihood or Bayesian decoders, which require knowledge of the distributions of ob… ▽ More

    Submitted 5 April, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: 5 pages, 3 figures, accepted by IEEE Signal Processing Letters (SPL)

  18. arXiv:2210.17349  [pdf, other

    cs.SD eess.AS

    Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

    Authors: Kun Song, Jian Cong, Xinsheng Wang, Yongmao Zhang, Lei Xie, Ning Jiang, Haiying Wu

    Abstract: In current two-stage neural text-to-speech (TTS) paradigm, it is ideal to have a universal neural vocoder, once trained, which is robust to imperfect mel-spectrogram predicted from the acoustic model. To this end, we propose Robust MelGAN vocoder by solving the original multi-band MelGAN's metallic sound problem and increasing its generalization ability. Specifically, we introduce a fine-grained n… ▽ More

    Submitted 2 November, 2022; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: Accepted by ISCSLP 2022

  19. arXiv:2207.00883  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

    Authors: Kun Wei, Pengcheng Guo, Ning Jiang

    Abstract: Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework. The main idea of Transformers is to capture the long-range global context within an utterance by self-attention layers. However, for scenarios like conversational speech, such utterance-level modeling will neglect con… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

    Comments: Accepted by Interspeech2022

  20. arXiv:2204.08910  [pdf, other

    eess.SP

    Adaptable Semantic Compression and Resource Allocation for Task-Oriented Communications

    Authors: Chuanhong Liu, Caili Guo, Yang Yang, Nan Jiang

    Abstract: Task-oriented communication is a new paradigm that aims at providing efficient connectivity for accomplishing intelligent tasks rather than the reception of every transmitted bit. In this paper, a deep learning-based task-oriented communication architecture is proposed where the user extracts, compresses and transmits semantics in an end-to-end (E2E) manner. Furthermore, an approach is proposed to… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

  21. arXiv:2201.01051  [pdf

    cs.CR eess.SP stat.ML

    Open Access Dataset for Electromyography based Multi-code Biometric Authentication

    Authors: Ashirbad Pradhan, Jiayuan He, Ning Jiang

    Abstract: Recently, surface electromyogram (EMG) has been proposed as a novel biometric trait for addressing some key limitations of current biometrics, such as spoofing and liveness. The EMG signals possess a unique characteristic: they are inherently different for individuals (biometrics), and they can be customized to realize multi-length codes or passwords (for example, by performing different gestures)… ▽ More

    Submitted 5 January, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

    Comments: manuscript for open access dataset (paper and appendix)

    Journal ref: Sci Data 9, 733 (2022)

  22. arXiv:2104.13873  [pdf, other

    cs.NI eess.SP

    Evaluating the Performance of Over-the-Air Time Synchronization for 5G and TSN Integration

    Authors: Haochuan Shi, Adnan Aijaz, Nan Jiang

    Abstract: The IEEE 802.1 time-sensitive networking (TSN) standards aim at improving the real-time capabilities of standard Ethernet. TSN is widely recognized as the long-term replacement of proprietary technologies for industrial control systems. However, wired connectivity alone is not sufficient to meet the requirements of future industrial systems. The fifth-generation (5G) mobile/cellular technology has… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: accepted for IEEE BlackSeaCom 2021

  23. arXiv:2103.15295  [pdf, other

    eess.IV cs.CV

    Best-Buddy GANs for Highly Detailed Image Super-Resolution

    Authors: Wenbo Li, Kun Zhou, Lu Qi, Liying Lu, Nianjuan Jiang, Jiangbo Lu, Jiaya Jia

    Abstract: We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) image is generated based on a low-resolution (LR) input. Recently, generative adversarial networks (GANs) become popular to hallucinate details. Most methods along this line rely on a predefined single-LR-single-HR map**, which is not flexible enough for the SISR task. Also, GAN-generated fake details may… ▽ More

    Submitted 27 December, 2021; v1 submitted 28 March, 2021; originally announced March 2021.

  24. arXiv:2103.06015  [pdf

    eess.SP

    Performance Optimization of Surface Electromyography (sEMG) based Biometric Sensing System for both Verification and Identification

    Authors: Ashirbad Pradhan, Jiayuan He, Ning Jiang

    Abstract: Recently, surface electromyography (sEMG) emerged as a novel biometric authentication method. Since EMG system parameters, such as the feature extraction methods and the number of channels, have been known to affect system performances, it is important to investigate these effects on the performance of the sEMG-based biometric system to determine optimal system parameters. In this study, three rob… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Comments: 12 pages, 6 figures, and one table

  25. arXiv:2009.06782  [pdf, other

    eess.SP eess.SY

    Analysis of Random Access in NB-IoT Networks with Three Coverage Enhancement Groups: A Stochastic Geometry Approach

    Authors: Yan Liu, Yansha Deng, Nan Jiang, Maged Elkashlan, Arumugam Nallanathan

    Abstract: NarrowBand-Internet of Things (NB-IoT) is a new 3GPP radio access technology designed to provide better coverage for Low Power Wide Area (LPWA) networks. To provide reliable connections with extended coverage, a repetition transmission scheme and up to three Coverage Enhancement (CE) groups are introduced into NB-IoT during both Random Access CHannel (RACH) procedure and data transmission procedur… ▽ More

    Submitted 14 September, 2020; originally announced September 2020.

    Comments: 15 pages, 8 figures. Accepted in IEEE TWC

  26. arXiv:2005.01092  [pdf, other

    cs.NI eess.SP

    A Decoupled Learning Strategy for Massive Access Optimization in Cellular IoT Networks

    Authors: Nan Jiang, Yansha Deng, Arumugam Nallanathan, **ghong Yuan

    Abstract: Cellular-based networks are expected to offer connectivity for massive Internet of Things (mIoT) systems. However, their Random Access CHannel (RACH) procedure suffers from unreliability, due to the collision from the simultaneous massive access. Despite that this collision problem has been treated in existing RACH schemes, these schemes usually organize IoT devices' transmission and re-transmissi… ▽ More

    Submitted 3 May, 2020; originally announced May 2020.

  27. arXiv:2004.04979  [pdf, other

    cs.CV cs.LG eess.IV

    Co-Saliency Spatio-Temporal Interaction Network for Person Re-Identification in Videos

    Authors: Jiawei Liu, Zheng-Jun Zha, Xierong Zhu, Na Jiang

    Abstract: Person re-identification aims at identifying a certain pedestrian across non-overlap** camera networks. Video-based re-identification approaches have gained significant attention recently, expanding image-based approaches by learning features from multiple frames. In this work, we propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos. It… ▽ More

    Submitted 11 May, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

  28. arXiv:2002.07759  [pdf, other

    cs.NI eess.SP

    Traffic Prediction and Random Access Control Optimization: Learning and Non-learning based Approaches

    Authors: Nan Jiang, Yansha Deng, Arumugam Nallanathan

    Abstract: Random access schemes in modern wireless communications are generally based on the framed-ALOHA (f-ALOHA), which can be optimized by flexibly organizing devices' transmission and re-transmission. However, this optimization is generally intractable due to the lack of information about complex traffic generation statistics and the occurrence of the random collision. In this article, we first summari… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

  29. arXiv:2002.03082  [pdf, other

    cs.LG cs.HC eess.AS

    RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

    Authors: Nan Jiang, Sheng **, Zhiyao Duan, Changshui Zhang

    Abstract: This paper presents a deep reinforcement learning algorithm for online accompaniment generation, with potential for real-time interactive human-machine duet improvisation. Different from offline music generation and harmonization, online music accompaniment requires the algorithm to respond to human input and generate the machine counterpart in a sequential order. We cast this as a reinforcement l… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

  30. arXiv:1907.11064  [pdf, other

    cs.NI eess.SP

    Online Supervised Learning for Traffic Load Prediction in Framed-ALOHA Networks

    Authors: Nan Jiang, Yansha Deng, Osvaldo Simeone, Arumugam Nallanathan

    Abstract: Predicting the current backlog, or traffic load, in framed-ALOHA networks enables the optimization of resource allocation, e.g., of the frame size. However, this prediction is made difficult by the lack of information about the cardinality of collisions and by possibly complex packet generation statistics. Assuming no prior information about the traffic model, apart from a bound on its temporal me… ▽ More

    Submitted 25 July, 2019; originally announced July 2019.

  31. Simultaneous induction of SSMVEP and SMR Using a Gaiting video stimulus: a novel hybrid brain-computer interface

    Authors: Xin Zhang, Guanghua Xu, Aravind Ravi, Sarah Pearce, Ning Jiang

    Abstract: We proposed a novel visual stimulus for brain-computer interface. The stimulus is in the form gaiting sequence of a human. The hypothesis is that observing such a visual stimulus would simultaneously induce 1) steady-state motion visual evoked potential (SSMVEP) in the occipital area, similarly to an SSVEP stimulus; and 2) sensorimotor rhythm (SMR) in the primary sensorimotor area, because such ac… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Comments: 22 pages, 7 figures and 2 tables

    Journal ref: Journal of Neural Engineering, Mar. 2020

  32. arXiv:1511.03722  [pdf, other

    cs.LG cs.AI eess.SY stat.ME stat.ML

    Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

    Authors: Nan Jiang, Lihong Li

    Abstract: We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy. This problem is often a critical step when applying RL in real-world problems. Despite its importance, existing general methods either have uncontrolled bias or suffer high variance. In this work, we extend the doubl… ▽ More

    Submitted 26 May, 2016; v1 submitted 11 November, 2015; originally announced November 2015.

    Comments: 14 pages; 4 figures; ICML 2016