Skip to main content

Showing 1–19 of 19 results for author: Kang, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.15704  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning

    Authors: Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, **g Xiao

    Abstract: Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial comp… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  2. arXiv:2404.13892  [pdf, other

    cs.SD cs.AI eess.AS

    Retrieval-Augmented Audio Deepfake Detection

    Authors: Zuheng Kang, Yayun He, Botao Zhao, Xiaoyang Qu, Junqing Peng, **g Xiao, Jianzong Wang

    Abstract: With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Conference on Multimedia Retrieval (ICMR 2024)

  3. arXiv:2310.04681  [pdf, other

    cs.SD cs.AI eess.AS

    VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model

    Authors: Yayun He, Zuheng Kang, Jianzong Wang, Junqing Peng, **g Xiao

    Abstract: Speaker verification (SV) performance deteriorates as utterances become shorter. To this end, we propose a new architecture called VoiceExtender which provides a promising solution for improving SV performance when handling short-duration speech signals. We use two guided diffusion models, the built-in and the external speaker embedding (SE) guided diffusion model, both of which utilize a diffusio… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted by the 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2023)

  4. arXiv:2307.12286  [pdf, ps, other

    cs.IT eess.SP

    Double-Active-IRS Aided Wireless Communication: Deployment Optimization and Capacity Scaling

    Authors: Zhenyu Kang, Changsheng You, Rui Zhang

    Abstract: In this letter, we consider a double-active-intelligent reflecting surface (IRS) aided wireless communication system, where two active IRSs are properly deployed to assist the communication from a base station (BS) to multiple users located in a given zone via the double-reflection links. Under the assumption of fixed per-element amplification power for each active-IRS element, we formulate a rate… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

  5. arXiv:2305.19581  [pdf, other

    cs.SD cs.AI eess.AS

    SVVAD: Personal Voice Activity Detection for Speaker Verification

    Authors: Zuheng Kang, Jianzong Wang, Junqing Peng, **g Xiao

    Abstract: Voice activity detection (VAD) improves the performance of speaker verification (SV) by preserving speech segments and attenuating the effects of non-speech. However, this scheme is not ideal: (1) it fails in noisy environments or multi-speaker conversations; (2) it is trained based on inaccurate non-SV sensitive labels. To address this, we propose a speaker verification-based voice activity detec… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023

  6. arXiv:2303.07643  [pdf, other

    cs.SD cs.AI eess.AS

    Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification

    Authors: Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Xiaoyang Qu, **g Xiao

    Abstract: Data-Free Knowledge Distillation (DFKD) has recently attracted growing attention in the academic community, especially with major breakthroughs in computer vision. Despite promising results, the technique has not been well applied to audio and signal processing. Due to the variable duration of audio signals, it has its own unique way of modeling. In this work, we propose feature-rich audio model i… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023. International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)

  7. arXiv:2301.04311  [pdf, other

    cs.IT eess.SP

    Active-IRS-Aided Wireless Communication: Fundamentals, Designs and Open Issues

    Authors: Zhenyu Kang, Changsheng You, Rui Zhang

    Abstract: Intelligent reflecting surface (IRS) has emerged as a promising technology to realize smart radio environment for future wireless communication systems. Existing works in this line of research have mainly considered the conventional passive IRS that reflects wireless signals without power amplification, while in this article, we give an overview of a new type of IRS, called active IRS, which enabl… ▽ More

    Submitted 25 June, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  8. arXiv:2210.09524  [pdf, other

    cs.SD cs.LG eess.AS

    SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning

    Authors: Zuheng Kang, Jianzong Wang, Junqing Peng, **g Xiao

    Abstract: Estimating age from a single speech is a classic and challenging topic. Although Label Distribution Learning (LDL) can represent adjacent indistinguishable ages well, the uncertainty of the age estimate for each utterance varies from person to person, i.e., the variance of the age distribution is different. To address this issue, we propose selective variance label distribution learning (SVLDL) me… ▽ More

    Submitted 16 November, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: Accepted by SLT 2022. The 2022 IEEE Spoken Language Technology Workshop (SLT 2022)

  9. arXiv:2207.01244  [pdf, other

    cs.IT eess.SP

    Active-Passive IRS aided Wireless Communication: New Hybrid Architecture and Elements Allocation Optimization

    Authors: Zhenyu Kang, Changsheng You, Rui Zhang

    Abstract: Intelligent reflecting surface (IRS) has emerged as a promising technology to enhance the wireless communication network coverage and capacity by dynamically controlling the radio signal propagation environment. In contrast to the existing works that considered active or passive IRS only, we propose in this paper a new hybrid active-passive IRS architecture that consists of both active and passive… ▽ More

    Submitted 4 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

  10. arXiv:2206.14597  [pdf, other

    cs.LG cs.AI eess.SP

    Generative Anomaly Detection for Time Series Datasets

    Authors: Zhuangwei Kang, Ayan Mukhopadhyay, Aniruddha Gokhale, Shijie Wen, Abhishek Dubey

    Abstract: Traffic congestion anomaly detection is of paramount importance in intelligent traffic systems. The goals of transportation agencies are two-fold: to monitor the general traffic conditions in the area of interest and to locate road segments under abnormal congestion states. Modeling congestion patterns can achieve these goals for citywide roadways, which amounts to learning the distribution of mul… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: A shorter version of the paper was accepted at the ITSC 2022

  11. arXiv:2206.13101  [pdf, other

    cs.SD cs.LG eess.AS

    SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning

    Authors: Zuheng Kang, Junqing Peng, Jianzong Wang, **g Xiao

    Abstract: Speech emotion recognition (SER) has many challenges, but one of the main challenges is that each framework does not have a unified standard. In this paper, we propose SpeechEQ, a framework for unifying SER tasks based on a multi-scale unified metric. This metric can be trained by Multitask Learning (MTL), which includes two emotion recognition tasks of Emotion States Category (EIS) and Emotion In… ▽ More

    Submitted 27 July, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

    Comments: This paper is accepted by Interspeech 2022

  12. arXiv:2204.02810  [pdf, other

    cs.CV cs.SD eess.AS

    Expression-preserving face frontalization improves visually assisted speech processing

    Authors: Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda

    Abstract: Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution of this paper is a frontalization methodology that preserves non-rigid facial deformations in order to boost the performance of visually assisted speech communication. The method alternates between the estimation of (i)~the rigid transformation (scale, rotation, and translatio… ▽ More

    Submitted 15 December, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: arXiv admin note: text overlap with arXiv:2202.00538

    Journal ref: International Journal of Computer Vision 131 (5), 1122-1140, 2023

  13. arXiv:2202.00538  [pdf, other

    cs.SD cs.CV eess.AS

    The impact of removing head movements on audio-visual speech enhancement

    Authors: Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda, Jacob Donley, Anurag Kumar

    Abstract: This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE). Although being a common conversational feature, head movements have been ignored by past and recent studies: they challenge today's learning-based methods as they often degrade the performance of models that are trained on clean, frontal, and steady face images. To alleviate this problem, we propose to… ▽ More

    Submitted 2 February, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

  14. arXiv:2201.02812  [pdf, other

    eess.IV cs.CV cs.LG

    Hyperspectral Image Denoising Using Non-convex Local Low-rank and Sparse Separation with Spatial-Spectral Total Variation Regularization

    Authors: Chong Peng, Yang Liu, Yongyong Chen, Xinxin Wu, Andrew Cheng, Zhao Kang, Chenglizhao Chen, Qiang Cheng

    Abstract: In this paper, we propose a novel nonconvex approach to robust principal component analysis for HSI denoising, which focuses on simultaneously develo** more accurate approximations to both rank and column-wise sparsity for the low-rank and sparse components, respectively. In particular, the new method adopts the log-determinant rank approximation and a novel $\ell_{2,\log}$ norm, to restrict the… ▽ More

    Submitted 8 January, 2022; originally announced January 2022.

  15. arXiv:2105.08630  [pdf, other

    eess.IV cs.CV cs.LG

    Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

    Authors: Andrey Ignatov, Grigory Malivenko, David Plowman, Samarth Shukla, Radu Timofte, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Yiran Wang, Xingyi Li, Min Shi, Ke Xian, Zhiguo Cao, **-Hua Du, Pei-Lin Wu, Chao Ge, Jiaoyang Yao, Fangwen Tu, Bo Li, Jung Eun Yoo, Kwanggyoon Seo, Jialei Xu , et al. (13 additional authors not shown)

    Abstract: Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based d… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: text overlap with arXiv:2105.07809

  16. IRS-Aided Wireless Relaying: Optimal Deployment and Capacity Scaling

    Authors: Zhenyu Kang, Changsheng You, Rui Zhang

    Abstract: In this letter, we consider an intelligent reflecting surface (IRS)-aided wireless relaying system, where a decode-and-forward relay (R) is employed to forward data from a source (S) to a destination (D), aided by M passive reflecting elements. We consider two practical IRS deployment strategies, namely, single-IRS deployment where all reflecting elements are mounted on one single IRS that is depl… ▽ More

    Submitted 27 October, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

  17. arXiv:2103.07151  [pdf, other

    cs.IT cs.NI eess.SP

    Enabling Smart Reflection in Integrated Air-Ground Wireless Network: IRS Meets UAV

    Authors: Changsheng You, Zhenyu Kang, Yong Zeng, Rui Zhang

    Abstract: Intelligent reflecting surface (IRS) and unmanned aerial vehicle (UAV) have emerged as two promising technologies to boost the performance of wireless communication networks, by proactively altering the wireless communication channels via smart signal reflection and maneuver control, respectively. However, they face different limitations in practice, which restrain their future applications. In th… ▽ More

    Submitted 29 March, 2021; v1 submitted 12 March, 2021; originally announced March 2021.

    Comments: In this article, we propose new methods to jointly apply IRS and UAV in integrated air-ground wireless networks by exploiting their complementary advantages

  18. Plant and Controller Optimization for Power and Energy Systems with Model Predictive Control

    Authors: Donald J. Docimo, Ziliang Kang, Kai A. James, Andrew G. Alleyne

    Abstract: This article explores the optimization of plant characteristics and controller parameters for electrified mobility. Electrification of mobile transportation systems, such as automobiles and aircraft, presents the ability to improve key performance metrics such as efficiency and cost. However, the strong bidirectional coupling between electrical and thermal dynamics within new components creates in… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Journal ref: J. Dyn. Sys., Meas., Control. Aug 2021, 143(8): 081009

  19. arXiv:2004.04079  [pdf, other

    cs.ET eess.SP

    A Noise Filter for Dynamic Vision Sensors using Self-adjusting Threshold

    Authors: Shasha Guo, Ziyang Kang, Lei Wang, Limeng Zhang, Xiaofan Chen, Shiming Li, Weixia Xu

    Abstract: Neuromorphic event-based dynamic vision sensors (DVS) have much faster sampling rates and a higher dynamic range than frame-based imagers. However, they are sensitive to background activity (BA) events which are unwanted. we propose a new criterion with little computation overhead for defining real events and BA events by utilizing the global space and time information rather than the local inform… ▽ More

    Submitted 1 June, 2020; v1 submitted 8 April, 2020; originally announced April 2020.