Skip to main content

Showing 1–30 of 30 results for author: Kuo, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2402.09846  [pdf

    physics.ao-ph cs.LG eess.SP

    A Deep Learning Approach to Radar-based QPE

    Authors: Ting-Shuo Yo, Shih-Hao Su, Jung-Lien Chu, Chiao-Wei Chang, Hung-Chi Kuo

    Abstract: In this study, we propose a volume-to-point framework for quantitative precipitation estimation (QPE) based on the Quantitative Precipitation Estimation and Segregation Using Multiple Sensor (QPESUMS) Mosaic Radar data set. With a data volume consisting of the time series of gridded radar reflectivities over the Taiwan area, we used machine learning algorithms to establish a statistical model for… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 22 pages, 11 figures. Published in Earth and Space Science

    Journal ref: Earth Space Sci. 2021, 8, e2020EA001340

  2. arXiv:2312.08622  [pdf, other

    eess.AS cs.LG cs.SD

    Scalable Ensemble-based Detection Method against Adversarial Attacks for speaker verification

    Authors: Haibin Wu, Heng-Cheng Kuo, Yu Tsao, Hung-yi Lee

    Abstract: Automatic speaker verification (ASV) is highly susceptible to adversarial attacks. Purification modules are usually adopted as a pre-processing to mitigate adversarial noise. However, they are commonly implemented across diverse experimental settings, rendering direct comparisons challenging. This paper comprehensively compares mainstream purification techniques in a unified framework. We find the… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Submitted to 2024 ICASSP

  3. arXiv:2307.04517  [pdf, other

    eess.AS

    Study on the Correlation between Objective Evaluations and Subjective Speech Quality and Intelligibility

    Authors: Hsin-Tien Chiang, Kuo-Hsuan Hung, Szu-Wei Fu, Heng-Cheng Kuo, Ming-Hsueh Tsai, Yu Tsao

    Abstract: Subjective tests are the gold standard for evaluating speech quality and intelligibility; however, they are time-consuming and expensive. Thus, objective measures that align with human perceptions are crucial. This study evaluates the correlation between commonly used objective measures and subjective speech quality and intelligibility using a Chinese speech dataset. Moreover, new objective measur… ▽ More

    Submitted 10 October, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

  4. arXiv:2211.06770  [pdf, other

    cs.CV cs.LG eess.IV

    MicroISP: Processing 32MP Photos on Mobile Devices with Deep Learning

    Authors: Andrey Ignatov, Anastasia Sycheva, Radu Timofte, Yu Tseng, Yu-Syuan Xu, Po-Hsiang Yu, Cheng-Ming Chiang, Hsien-Kai Kuo, Min-Hung Chen, Chia-Ming Cheng, Luc Van Gool

    Abstract: While neural networks-based photo processing solutions can provide a better image quality compared to the traditional ISP systems, their application to mobile devices is still very limited due to their very high computational complexity. In this paper, we present a novel MicroISP model designed specifically for edge devices, taking into account their computational and memory limitations. The propo… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2211.06263

  5. arXiv:2211.06263  [pdf, other

    cs.CV cs.LG eess.IV

    PyNet-V2 Mobile: Efficient On-Device Photo Processing With Neural Networks

    Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Yu Tseng, Yu-Syuan Xu, Po-Hsiang Yu, Cheng-Ming Chiang, Hsien-Kai Kuo, Min-Hung Chen, Chia-Ming Cheng, Luc Van Gool

    Abstract: The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address th… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

  6. arXiv:2211.05256  [pdf, other

    eess.IV cs.CV

    Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang, Hsien-Kai Kuo, Yu-Syuan Xu, Man-Yu Lee, Allen Lu, Chia-Ming Cheng, Chih-Cheng Chen, Jia-Ying Yong, Hong-Han Shuai, Wen-Huang Cheng, Zhuang Jia, Tianyu Xu, Yijian Zhang, Long Bao, Heng Sun, Diankai Zhang, Si Gao, Shaoli Liu, Biao Wu, Xiaofeng Zhang, Chengjian Zheng, Kaidi Lu, Ning Wang , et al. (29 additional authors not shown)

    Abstract: Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.08826, arXiv:2105.07809, arXiv:2211.04470, arXiv:2211.03885

  7. arXiv:2207.13965  [pdf, other

    eess.AS cs.SD

    Extending RNN-T-based speech recognition systems with emotion and language classification

    Authors: Zvi Kons, Hagai Aronowitz, Edmilson Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas, George Saon

    Abstract: Speech transcription, emotion recognition, and language identification are usually considered to be three different tasks. Each one requires a different model with a different architecture and training process. We propose using a recurrent neural network transducer (RNN-T)-based speech-to-text (STT) system as a common component that can be used for emotion recognition and language identification a… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

    Comments: Accepted for publication in Interspeech 2022

  8. arXiv:2204.05188  [pdf, other

    cs.CL cs.SD eess.AS

    Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems

    Authors: Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury

    Abstract: Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) have been primarily due to effective pretraining of speech representations. One such pretraining paradigm is the distillation of semantic knowledge from state-of-the-art text-based models like BERT to speech encoder neural networks. This work is a step towards doing the same in a much more efficient and fine-grained manner whe… ▽ More

    Submitted 1 July, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: 5 pages, 2 figures

  9. arXiv:2203.00006  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

    Authors: Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury, George Saon

    Abstract: The lack of speech data annotated with labels required for spoken language understanding (SLU) is often a major hurdle in building end-to-end (E2E) systems that can directly process speech inputs. In contrast, large amounts of text data with suitable labels are usually available. In this paper, we propose a novel text representation and training methodology that allows E2E SLU systems to be effect… ▽ More

    Submitted 26 February, 2022; originally announced March 2022.

    Comments: \c{opyright}2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. arXiv admin note: text overlap with arXiv:2202.13155

  10. arXiv:2202.13155  [pdf, other

    cs.CL cs.SD eess.AS

    Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models

    Authors: Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang J. Kuo

    Abstract: Compared to hybrid automatic speech recognition (ASR) systems that use a modular architecture in which each component can be independently adapted to a new domain, recent end-to-end (E2E) ASR system are harder to customize due to their all-neural monolithic construction. In this paper, we propose a novel text representation and training framework for E2E ASR models. With this approach, we show tha… ▽ More

    Submitted 26 February, 2022; originally announced February 2022.

    Comments: \c{opyright}2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  11. arXiv:2202.10137  [pdf, other

    cs.CL eess.AS

    A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets

    Authors: Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli, Ron Hoory, Brian Kingsbury

    Abstract: Intent classifiers are vital to the successful operation of virtual agent systems. This is especially so in voice activated systems where the data can be noisy with many ambiguous directions for user intents. Before operation begins, these classifiers are generally lacking in real-world training data. Active learning is a common approach used to help label large amounts of collected user input. Ho… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

    Comments: \c{opyright} 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  12. arXiv:2202.06684  [pdf, other

    eess.AS cs.LG cs.SD

    Partially Fake Audio Detection by Self-attention-based Fake Span Discovery

    Authors: Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao, Hsin-Min Wang, Helen Meng

    Abstract: The past few years have witnessed the significant advances of speech synthesis and voice conversion technologies. However, such technologies can undermine the robustness of broadly implemented biometric identification models and can be harnessed by in-the-wild attackers for illegal uses. The ASVspoof challenge mainly focuses on synthesized audios by advanced speech synthesis and voice conversion m… ▽ More

    Submitted 15 February, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Submitted to ICASSP 2022

  13. arXiv:2201.12105  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improving End-to-End Models for Set Prediction in Spoken Language Understanding

    Authors: Hong-Kwang J. Kuo, Zoltan Tuske, Samuel Thomas, Brian Kingsbury, George Saon

    Abstract: The goal of spoken language understanding (SLU) systems is to determine the meaning of the input speech signal, unlike speech recognition which aims to produce verbatim transcripts. Advances in end-to-end (E2E) speech modeling have made it possible to train solely on semantic entities, which are far cheaper to collect than verbatim transcripts. We focus on this set prediction problem, where entity… ▽ More

    Submitted 28 January, 2022; originally announced January 2022.

    Comments: ICASSP \c{opyright}2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    ACM Class: I.2.7

  14. arXiv:2112.02538  [pdf, ps, other

    eess.AS cs.SD

    Toward Real-World Voice Disorder Classification

    Authors: Heng-Cheng Kuo, Yu-Peng Hsieh, Huan-Hsin Tseng, Chi-Te Wang, Shih-Hau Fang, Yu Tsao

    Abstract: Objective: Voice disorders significantly compromise individuals' ability to speak in their daily lives. Without early diagnosis and treatment, these disorders may deteriorate drastically. Thus, automatic classification systems at home are desirable for people who are inaccessible to clinical disease assessments. However, the performance of such systems may be weakened due to the constrained resour… ▽ More

    Submitted 26 April, 2023; v1 submitted 5 December, 2021; originally announced December 2021.

    Comments: Accepted by IEEE TBME (under an IEEE Open Access publishing Agreement)

  15. arXiv:2108.08405  [pdf, other

    cs.CL cs.SD eess.AS

    Integrating Dialog History into End-to-End Spoken Language Understanding Systems

    Authors: Jatin Ganhotra, Samuel Thomas, Hong-Kwang J. Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury

    Abstract: End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently. Spoken conversations on the other hand, are very much context dependent, and dialog history contains useful information that can improve the processing of each conversational turn. In this paper, we inves… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: Interspeech 2021

  16. arXiv:2106.07953  [pdf, other

    eess.SP cs.LG

    Learning to Compensate: A Deep Neural Network Framework for 5G Power Amplifier Compensation

    Authors: Po-Yu Chen, Hao Chen, Yi-Min Tsai, Hsien-Kai Kuo, Hantao Huang, Hsin-Hung Chen, Sheng-Hong Yan, Wei-Lun Ou, Chia-Ming Cheng

    Abstract: Owing to the complicated characteristics of 5G communication system, designing RF components through mathematical modeling becomes a challenging obstacle. Moreover, such mathematical models need numerous manual adjustments for various specification requirements. In this paper, we present a learning-based framework to model and compensate Power Amplifiers (PAs) in 5G communication. In the proposed… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: IEEE International Conference on Communications (ICC) 2021

  17. arXiv:2105.07809  [pdf, other

    eess.IV cs.CV cs.LG

    Learned Smartphone ISP on Mobile NPUs with Deep Learning, Mobile AI 2021 Challenge: Report

    Authors: Andrey Ignatov, Cheng-Ming Chiang, Hsien-Kai Kuo, Anastasia Sycheva, Radu Timofte, Min-Hung Chen, Man-Yu Lee, Yu-Syuan Xu, Yu Tseng, Shusong Xu, ** Guo, Chao-Hung Chen, Ming-Chun Hsyu, Wen-Chia Tsai, Chao-Wei Chen, Grigory Malivenko, Minsu Kwon, Myungje Lee, Jaeyoon Yoo, Changbeom Kang, Shinjo Wang, Zheng Shaolong, Hao Dejun, Xie Fen, Feng Zhuang , et al. (16 additional authors not shown)

    Abstract: As the quality of mobile cameras starts to play a crucial role in modern smartphones, more and more attention is now being paid to ISP algorithms used to improve various perceptual aspects of mobile photos. In this Mobile AI challenge, the target was to develop an end-to-end deep learning-based image signal processing (ISP) pipeline that can replace classical hand-crafted ISPs and achieve nearly r… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/

  18. arXiv:2104.05752  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs

    Authors: Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Kuo, Samuel Thomas, Edmilson Morais

    Abstract: A major focus of recent research in spoken language understanding (SLU) has been on the end-to-end approach where a single model can predict intents directly from speech inputs without intermediate transcripts. However, this approach presents some challenges. First, since speech can be considered as personally identifiable information, in some cases only automatic speech recognition (ASR) transcri… ▽ More

    Submitted 14 June, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted to Interspeech 2021

  19. arXiv:2104.03842  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    RNN Transducer Models For Spoken Language Understanding

    Authors: Samuel Thomas, Hong-Kwang J. Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory

    Abstract: We present a comprehensive study on building and adapting RNN transducer (RNN-T) models for spoken language understanding(SLU). These end-to-end (E2E) models are constructed in three practical settings: a case where verbatim transcripts are available, a constrained case where the only available annotations are SLU labels and their values, and a more restrictive case where transcripts are available… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: To appear in the proceedings of ICASSP 2021

  20. arXiv:2012.10911  [pdf

    eess.SP cs.LG

    Domain-adaptive Fall Detection Using Deep Adversarial Training

    Authors: Kai-Chun Liu, Michael Can, Heng-Cheng Kuo, Chia-Yeh Hsieh, Hsiang-Yun Huang, Chia-Tai Chan, Yu Tsao

    Abstract: Fall detection (FD) systems are important assistive technologies for healthcare that can detect emergency fall events and alert caregivers. However, it is not easy to obtain large-scale annotated fall events with various specifications of sensors or sensor positions during the implementation of accurate FD systems. Moreover, the knowledge obtained through machine learning has been restricted to ta… ▽ More

    Submitted 14 June, 2021; v1 submitted 20 December, 2020; originally announced December 2020.

    Comments: Accepted by IEEE Transactions on Neural Systems and Rehabilitation Engineering, 10 pages, 8 figures, 5 tables

  21. arXiv:2011.08238  [pdf

    cs.CL cs.SD eess.AS

    End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features

    Authors: Edmilson Morais, Hong-Kwang J. Kuo, Samuel Thomas, Zoltan Tuske, Brian Kingsbury

    Abstract: Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation. In this paper we introduce a modular End-to-End (E2E) SLU transformer network based architecture which allows the use of self-supervised p… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

    Comments: 5 pages, 3 tables and 1 figure

  22. arXiv:2010.04284  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

    Authors: Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny

    Abstract: Training an end-to-end (E2E) neural network speech-to-intent (S2I) system that directly extracts intents from speech requires large amounts of intent-labeled speech data, which is time consuming and expensive to collect. Initializing the S2I model with an ASR model trained on copious speech data can alleviate data sparsity. In this paper, we attempt to leverage NLU text resources. We implemented a… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: 5 pages, published in ICASSP 2020

    ACM Class: I.2.7

  23. arXiv:2009.14386  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    End-to-End Spoken Language Understanding Without Full Transcripts

    Authors: Hong-Kwang J. Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis Lastras

    Abstract: An essential component of spoken language understanding (SLU) is slot filling: representing the meaning of a spoken utterance using semantic entity labels. In this paper, we develop end-to-end (E2E) spoken language understanding systems that directly convert speech input to semantic entities and investigate if these E2E SLU models can be trained solely on semantic entity annotations without word-f… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

    Comments: 5 pages, to be published in Interspeech 2020

    ACM Class: I.2.7

  24. arXiv:2006.10296  [pdf

    eess.AS cs.LG cs.SD

    Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

    Authors: Szu-Wei Fu, Chien-Feng Liao, Tsun-An Hsieh, Kuo-Hsuan Hung, Syu-Siang Wang, Cheng Yu, Heng-Cheng Kuo, Ryandhimas E. Zezario, You-** Li, Shang-Yi Chuang, Yen-Ju Lu, Yu Tsao

    Abstract: The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications. Therefore, our study applies a modified Transformer in a speech enhancement task. Specifically, positional encoding in the Transformer may not be necessary for speech enhancement, and hence, it is replaced by convolutional layers. To fur… ▽ More

    Submitted 3 March, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted by APSIPA 2020

  25. arXiv:2004.12599  [pdf, other

    cs.CV eess.IV

    Deploying Image Deblurring across Mobile Devices: A Perspective of Quality and Latency

    Authors: Cheng-Ming Chiang, Yu Tseng, Yu-Syuan Xu, Hsien-Kai Kuo, Yi-Min Tsai, Guan-Yu Chen, Koan-Sin Tan, Wei-Ting Wang, Yu-Chieh Lin, Shou-Yao Roy Tseng, Wei-Shiang Lin, Chia-Lin Yu, BY Shen, Kloze Kao, Chia-Ming Cheng, Hung-Jen Chen

    Abstract: Recently, image enhancement and restoration have become important applications on mobile devices, such as super-resolution and image deblurring. However, most state-of-the-art networks present extremely high computational complexity. This makes them difficult to be deployed on mobile devices with acceptable latency. Moreover, when deploying to different mobile devices, there is a large latency var… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: CVPR 2020 Workshop on New Trends in Image Restoration and Enhancement (NTIRE)

  26. arXiv:2004.06965  [pdf, other

    eess.IV cs.CV

    Unified Dynamic Convolutional Network for Super-Resolution with Variational Degradations

    Authors: Yu-Syuan Xu, Shou-Yao Roy Tseng, Yu Tseng, Hsien-Kai Kuo, Yi-Min Tsai

    Abstract: Deep Convolutional Neural Networks (CNNs) have achieved remarkable results on Single Image Super-Resolution (SISR). Despite considering only a single degradation, recent studies also include multiple degrading effects to better reflect real-world cases. However, most of the works assume a fixed combination of degrading effects, or even train an individual network for different combinations. Instea… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

    Comments: CVPR 2020

  27. arXiv:1909.12342  [pdf, other

    eess.IV cs.CV eess.SP

    Compressed Sensing Microscopy with Scanning Line Probes

    Authors: Han-Wen Kuo, Anna E. Dorfi, Daniel V. Esposito, John N. Wright

    Abstract: In applications of scanning probe microscopy, images are acquired by raster scanning a point probe across a sample. Viewed from the perspective of compressed sensing (CS), this pointwise sampling scheme is inefficient, especially when the target image is structured. While replacing point measurements with delocalized, incoherent measurements has the potential to yield order-of-magnitude improvemen… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: 15 pages, 13 figures

  28. arXiv:1908.10959  [pdf, other

    eess.SP cs.LG eess.IV math.OC stat.ML

    Short-and-Sparse Deconvolution -- A Geometric Approach

    Authors: Yenson Lau, Qing Qu, Han-Wen Kuo, Pengcheng Zhou, Yuqian Zhang, John Wright

    Abstract: Short-and-sparse deconvolution (SaSD) is the problem of extracting localized, recurring motifs in signals with spatial or temporal structure. Variants of this problem arise in applications such as image deblurring, microscopy, neural spike sorting, and more. The problem is challenging in both theory and practice, as natural optimization formulations are nonconvex. Moreover, practical deconvolution… ▽ More

    Submitted 1 October, 2019; v1 submitted 28 August, 2019; originally announced August 2019.

    Comments: *YL and QQ contributed equally to this work; 30 figures, 45 pages; This version: added an experiment comparing with other methods, corrected typos and added references

  29. arXiv:1901.00256  [pdf, other

    eess.SP cs.LG eess.IV math.OC

    Geometry and Symmetry in Short-and-Sparse Deconvolution

    Authors: Han-Wen Kuo, Yenson Lau, Yuqian Zhang, John Wright

    Abstract: We study the $\textit{Short-and-Sparse (SaS) deconvolution}$ problem of recovering a short signal $\mathbf a_0$ and a sparse signal $\mathbf x_0$ from their convolution. We propose a method based on nonconvex optimization, which under certain conditions recovers the target short and sparse signals, up to a signed shift symmetry which is intrinsic to this model. This symmetry plays a central role i… ▽ More

    Submitted 11 April, 2019; v1 submitted 1 January, 2019; originally announced January 2019.

  30. arXiv:1806.00338  [pdf, other

    eess.SP cs.IT math.OC stat.ML

    Structured Local Optima in Sparse Blind Deconvolution

    Authors: Yuqian Zhang, Han-Wen Kuo, John Wright

    Abstract: Blind deconvolution is a ubiquitous problem of recovering two unknown signals from their convolution. Unfortunately, this is an ill-posed problem in general. This paper focuses on the {\em short and sparse} blind deconvolution problem, where the one unknown signal is short and the other one is sparsely and randomly supported. This variant captures the structure of the unknown signals in several im… ▽ More

    Submitted 21 July, 2019; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: 63 pages, 7 figures