Skip to main content

Showing 1–24 of 24 results for author: Liang, Q

Searching in archive eess. Search in all archives.
.
  1. Secure Degree of Freedom of Wireless Networks Using Collaborative Pilots

    Authors: Yingbo Hua, Qingpeng Liang, Md Saydur Rahman

    Abstract: A wireless network of full-duplex nodes/users, using anti-eavesdrop** channel estimation (ANECE) based on collaborative pilots, can yield a positive secure degree-of-freedom (SDoF) regardless of the number of antennas an eavesdropper may have. This paper presents novel results on SDoF of ANECE by analyzing secret-key capacity (SKC) of each pair of nodes in a network of multiple collaborative nod… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  2. arXiv:2309.11895  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Audio Contrastive based Fine-tuning

    Authors: Yang Wang, Qibin Liang, Chenghao Xiao, Yizhi Li, Noura Al Moubayed, Chenghua Lin

    Abstract: Audio classification plays a crucial role in speech and sound processing tasks with a wide range of applications. There still remains a challenge of striking the right balance between fitting the model to the training data (avoiding overfitting) and enabling it to generalise well to a new domain. Leveraging the transferability of contrastive learning, we introduce Audio Contrastive-based Fine-tuni… ▽ More

    Submitted 19 October, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Under review

  3. arXiv:2301.04488  [pdf, other

    cs.SD cs.HC cs.LG eess.AS

    WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning

    Authors: Kejun Zhang, Xinda Wu, Tieyao Zhang, Zhijie Huang, Xu Tan, Qihao Liang, Songruoyao Wu, Lingyun Sun

    Abstract: Although deep learning has revolutionized music generation, existing methods for structured melody generation follow an end-to-end left-to-right note-by-note generative paradigm and treat each note equally. Here, we present WuYun, a knowledge-enhanced deep learning architecture for improving the structure of generated melodies, which first generates the most structurally important notes to constru… ▽ More

    Submitted 14 March, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  4. arXiv:2209.06054  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias

    Authors: Zihao Wang, Qihao Liang, Kejun Zhang, Yuxing Wang, Chen Zhang, Pengfei Yu, Yongsheng Feng, Wenbo Liu, Yikai Wang, Yuntai Bao, Yiheng Yang

    Abstract: Real-time music accompaniment generation has a wide range of applications in the music industry, such as music education and live performances. However, automatic real-time music accompaniment generation is still understudied and often faces a trade-off between logical latency and exposure bias. In this paper, we propose SongDriver, a real-time music accompaniment generation system without logical… ▽ More

    Submitted 13 October, 2022; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: *Both Zihao Wang and Qihao Liang contribute equally to the paper and share the co-first authorship. This paper has been accepted by ACM Multimedia 2022, oral session, full paper (main track)

  5. arXiv:2208.13916  [pdf, other

    eess.AS cs.CL cs.SD

    A Language Agnostic Multilingual Streaming On-Device ASR System

    Authors: Bo Li, Tara N. Sainath, Ruoming Pang, Shuo-yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani

    Abstract: On-device end-to-end (E2E) models have shown improvements over a conventional model on English Voice Search tasks in both quality and latency. E2E models have also shown promising results for multilingual automatic speech recognition (ASR). In this paper, we extend our previous capacity solution to streaming applications and present a streaming multilingual E2E ASR system that runs fully on device… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

    Comments: Accepted in Interspeech 2022

  6. arXiv:2208.13322  [pdf, other

    cs.CL cs.SD eess.AS

    Streaming Intended Query Detection using E2E Modeling for Continued Conversation

    Authors: Shuo-yiin Chang, Guru Prakash, Zelin Wu, Qiao Liang, Tara N. Sainath, Bo Li, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman

    Abstract: In voice-enabled applications, a predetermined hotword isusually used to activate a device in order to attend to the query.However, speaking queries followed by a hotword each timeintroduces a cognitive burden in continued conversations. Toavoid repeating a hotword, we propose a streaming end-to-end(E2E) intended query detector that identifies the utterancesdirected towards the device and filters… ▽ More

    Submitted 28 August, 2022; originally announced August 2022.

    Comments: 5 pages, Interspeech 2022

  7. arXiv:2208.13321  [pdf, other

    cs.CL cs.SD eess.AS

    Turn-Taking Prediction for Natural Conversational Speech

    Authors: Shuo-yiin Chang, Bo Li, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He

    Abstract: While a streaming voice assistant system has been used in many applications, this system typically focuses on unnatural, one-shot interactions assuming input from a single voice query without hesitation or disfluency. However, a common conversational utterance often involves multiple queries with turn-taking, in addition to disfluencies. These disfluencies include pausing to think, hesitations, wo… ▽ More

    Submitted 28 August, 2022; originally announced August 2022.

    Comments: 5 pages, Interspeech 2022

  8. arXiv:2204.06164  [pdf, other

    eess.AS cs.LG cs.SD

    A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

    Authors: Shao** Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman

    Abstract: In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without loss of quality. Namely, with the dynamic cascaded encoder model, we explore three techniques to maximally boost the performance of each model size: 1) Use separa… ▽ More

    Submitted 24 June, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Accepted by INTERSPEECH 2022

  9. arXiv:2204.03793  [pdf, other

    eess.AS cs.LG cs.SD

    Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition

    Authors: Shao** Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw

    Abstract: Personalization of on-device speech recognition (ASR) has seen explosive growth in recent years, largely due to the increasing popularity of personal assistant features on mobile devices and smart home speakers. In this work, we present Personal VAD 2.0, a personalized voice activity detector that detects the voice activity of a target speaker, as part of a streaming on-device ASR system. Although… ▽ More

    Submitted 24 June, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted by INTERSPEECH 2022

  10. arXiv:2202.12169  [pdf, other

    eess.AS cs.LG stat.ML

    Closing the Gap between Single-User and Multi-User VoiceFilter-Lite

    Authors: Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ian McGraw

    Abstract: VoiceFilter-Lite is a speaker-conditioned voice separation model that plays a crucial role in improving speech recognition and speaker verification by suppressing overlap** speech from non-target speakers. However, one limitation of VoiceFilter-Lite, and other speaker-conditioned speech models in general, is that these models are usually limited to a single target speaker. This is undesirable as… ▽ More

    Submitted 26 April, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  11. arXiv:2201.07312  [pdf, other

    cs.DC eess.SY

    Model-driven Cluster Resource Management for AI Workloads in Edge Clouds

    Authors: Qianlin Liang, Walid A. Hanafy, Ahmed Ali-Eldin, Prashant Shenoy

    Abstract: Since emerging edge applications such as Internet of Things (IoT) analytics and augmented reality have tight latency constraints, hardware AI accelerators have been recently proposed to speed up deep neural network (DNN) inference run by these applications. Resource-constrained edge servers and accelerators tend to be multiplexed across multiple IoT applications, introducing the potential for perf… ▽ More

    Submitted 18 January, 2022; originally announced January 2022.

  12. arXiv:2201.00429  [pdf, other

    eess.IV cs.CV

    Image Denoising with Control over Deep Network Hallucination

    Authors: Qiyuan Liang, Florian Cassayre, Haley Owsianko, Majed El Helou, Sabine Süsstrunk

    Abstract: Deep image denoisers achieve state-of-the-art results but with a hidden cost. As witnessed in recent literature, these deep networks are capable of overfitting their training distributions, causing inaccurate hallucinations to be added to the output and generalizing poorly to varying data. For better control and interpretability over a deep denoiser, we propose a novel framework exploiting a denoi… ▽ More

    Submitted 2 January, 2022; originally announced January 2022.

    Comments: Published in Electronic Imaging 2022, code available at https://github.com/IVRL/CCID

  13. arXiv:2107.01201  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-user VoiceFilter-Lite via Attentive Speaker Embedding

    Authors: Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ian McGraw

    Abstract: In this paper, we propose a solution to allow speaker conditioned speech models, such as VoiceFilter-Lite, to support an arbitrary number of enrolled users in a single pass. This is achieved by using an attention mechanism on multiple speaker embeddings to compute a single attentive embedding, which is then used as a side input to the model. We implemented multi-user VoiceFilter-Lite and evaluated… ▽ More

    Submitted 8 November, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

  14. arXiv:2106.09311  [pdf, other

    eess.IV cs.CV

    Controllable Confidence-Based Image Denoising

    Authors: Haley Owsianko, Florian Cassayre, Qiyuan Liang

    Abstract: Image denoising is a classic restoration problem. Yet, current deep learning methods are subject to the problems of generalization and interpretability. To mitigate these problems, in this project, we present a framework that is capable of controllable, confidence-based noise removal. The framework is based on the fusion between two different denoised images, both derived from the same noisy input… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  15. arXiv:2104.13970  [pdf, other

    eess.AS cs.LG cs.SD

    Personalized Keyphrase Detection using Speaker and Environment Information

    Authors: Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng, Huang, Arun Narayanan, Ian McGraw

    Abstract: In this paper, we introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary. The system is implemented with an end-to-end trained automatic speech recognition (ASR) model and a text-independent speaker verification model. To address the challenge of detecting these keyphrases under various noisy conditio… ▽ More

    Submitted 15 June, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

  16. arXiv:2011.10798  [pdf, other

    eess.AS cs.SD

    A Better and Faster End-to-End Model for Streaming ASR

    Authors: Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu

    Abstract: End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency compared to a conventional ASR model. To address this i… ▽ More

    Submitted 11 February, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

    Comments: Accepted in ICASSP 2021

  17. arXiv:2011.06110  [pdf, other

    eess.AS cs.SD

    Efficient Knowledge Distillation for RNN-Transducer Models

    Authors: Sankaran Panchapagesan, Daniel S. Park, Chung-Cheng Chiu, Yuan Shangguan, Qiao Liang, Alexander Gruenstein

    Abstract: Knowledge Distillation is an effective method of transferring knowledge from a large model to a smaller model. Distillation can be viewed as a type of model compression, and has played an important role for on-device ASR applications. In this paper, we develop a distillation method for RNN-Transducer (RNN-T) models, a popular end-to-end neural network architecture for streaming speech recognition.… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: 5 pages, 1 figure, 2 tables; submitted to ICASSP 2021

  18. arXiv:2006.11555  [pdf

    cs.LG eess.SP stat.ML

    A deep convolutional neural network model for rapid prediction of fluvial flood inundation

    Authors: Syed Kabir, Sandhya Patidar, Xilin Xia, Qiuhua Liang, Jeffrey Neal, Gareth Pender, .

    Abstract: Most of the two-dimensional (2D) hydraulic/hydrodynamic models are still computationally too demanding for real-time applications. In this paper, an innovative modelling approach based on a deep convolutional neural network (CNN) method is presented for rapid prediction of fluvial flood inundation. The CNN model is trained using outputs from a 2D hydraulic model (i.e. LISFLOOD-FP) to predict water… ▽ More

    Submitted 16 September, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

    Comments: 45 pages, 14 figures, 7 tables

    Journal ref: J. Hydrol. 125481 (2020)

  19. arXiv:2005.10627  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Dynamic Sparsity Neural Networks for Automatic Speech Recognition

    Authors: Zhaofeng Wu, Ding Zhao, Qiao Liang, Jiahui Yu, Anmol Gulati, Ruoming Pang

    Abstract: In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints. However, multiple models with different sparsity levels usually need to be separately trained and deployed to heterogeneous target hardware with different resource specifications and for applications that h… ▽ More

    Submitted 8 February, 2021; v1 submitted 16 May, 2020; originally announced May 2020.

    Comments: ICASSP 2021. (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  20. arXiv:1912.07960  [pdf, other

    eess.SP cs.IT

    Capacity Characterization for Reconfigurable Intelligent Surfaces Assisted Multiple-Antenna Multicast

    Authors: Linsong Du, Shihai Shao Gang Yang, Jianhui Ma, Qinpeng Liang, Youxi Tang

    Abstract: The reconfigurable intelligent surface (RIS), which consists of a large number of passive and low-cost reflecting elements, has been recognized as a revolutionary technology to enhance the performance of future wireless networks. This paper considers an RIS assisted multicast transmission, where a base station (BS) with multiple-antenna multicasts common message to multiple single-antenna mobile u… ▽ More

    Submitted 24 May, 2021; v1 submitted 17 December, 2019; originally announced December 2019.

  21. arXiv:1909.12408  [pdf, other

    cs.CL cs.LG eess.AS

    Optimizing Speech Recognition For The Edge

    Authors: Yuan Shangguan, Jian Li, Qiao Liang, Raziel Alvarez, Ian McGraw

    Abstract: While most deployed speech recognition systems today still run on servers, we are in the midst of a transition towards deployments on edge devices. This leap to the edge is powered by the progression from traditional speech recognition pipelines to end-to-end (E2E) neural architectures, and the parallel development of more efficient neural network topologies and optimization techniques. Thus, we a… ▽ More

    Submitted 6 February, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

  22. arXiv:1908.10992  [pdf, other

    cs.CL cs.SD eess.AS

    Two-Pass End-to-End Speech Recognition

    Authors: Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu

    Abstract: The requirements for many applications of state-of-the-art speech recognition systems include not only low word error rate (WER) but also low latency. Specifically, for many use-cases, the system must be able to decode utterances in a streaming fashion and faster than real-time. Recently, a streaming recurrent neural network transducer (RNN-T) end-to-end (E2E) model has shown to be a good candidat… ▽ More

    Submitted 28 August, 2019; originally announced August 2019.

  23. arXiv:1907.12898  [pdf

    cs.CV cs.LG eess.IV stat.ML

    A Multi-Scale Map** Approach Based on a Deep Learning CNN Model for Reconstructing High-Resolution Urban DEMs

    Authors: Ling Jiang, Yang Hu, Xilin Xia, Qiuhua Liang, Andrea Soltoggio

    Abstract: The shortage of high-resolution urban digital elevation model (DEM) datasets has been a challenge for modelling urban flood and managing its risk. A solution is to develop effective approaches to reconstruct high-resolution DEMs from their low-resolution equivalents that are more widely available. However, the current high-resolution DEM reconstruction approaches mainly focus on natural topography… ▽ More

    Submitted 8 October, 2019; v1 submitted 19 July, 2019; originally announced July 2019.

  24. arXiv:1805.00362  [pdf

    eess.SP

    A code-free optical undersampling technique for broadband microwave spectrum measurement

    Authors: Guangyu Gao, Xueshuang Xiang, Qijun Liang, Nai** Liu

    Abstract: A novel broadband microwave (MW) spectrum measurement (BMSM) scheme based on code-free optical undersampling and homodyne detection is proposed. The fully analog generation of optical pulses with a far-less-than-Nyquist rate is only through modulating cascaded electrooptical modulators by a single RF tone instead of any high-speed coding sequence modulation. Homodyne detection will reduce the anal… ▽ More

    Submitted 31 July, 2019; v1 submitted 29 April, 2018; originally announced May 2018.

    Comments: 3 pages and 7 figures