Skip to main content

Showing 1–19 of 19 results for author: Le, Q

Searching in archive eess. Search in all archives.
.
  1. arXiv:2302.03917  [pdf, other

    cs.SD cs.LG eess.AS

    Noise2Music: Text-conditioned Music Generation with Diffusion Models

    Authors: Qingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V. Le, William Chan, Zhifeng Chen, Wei Han

    Abstract: We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts. Two types of diffusion models, a generator model, which generates an intermediate representation conditioned on text, and a cascader model, which generates high-fidelity audio conditioned on the intermediate representation and possibly the text, are trained and… ▽ More

    Submitted 6 March, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 15 pages

  2. arXiv:2210.15897  [pdf, other

    eess.IV cs.CV cs.GR

    Single-Image HDR Reconstruction by Multi-Exposure Generation

    Authors: Phuoc-Hieu Le, Quynh Le, Rang Nguyen, Binh-Son Hua

    Abstract: High dynamic range (HDR) imaging is an indispensable technique in modern photography. Traditional methods focus on HDR reconstruction from multiple images, solving the core problems of image alignment, fusion, and tone map**, yet having a perfect solution due to ghosting and other visual artifacts in the reconstruction. Recent attempts at single-image HDR reconstruction show a promising alternat… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: WACV 2023 paper. 8 pages of content, 2 pages of references, 8 pages of supplementary material

  3. arXiv:2210.10879  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

    Authors: Gary Wang, Ekin D. Cubuk, Andrew Rosenberg, Shuyang Cheng, Ron J. Weiss, Bhuvana Ramabhadran, Pedro J. Moreno, Quoc V. Le, Daniel S. Park

    Abstract: Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more "end-to-end", the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present Graph-Augment, a technique to define the augmentation space as… ▽ More

    Submitted 24 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: 6 pages, accepted at SLT 2022. Updated with copyright

  4. arXiv:2202.12430  [pdf

    eess.SY nlin.CD

    Koopman Spectral Analysis of Intermittent Dynamics in Complex Systems: A Case Study in Pathophysiological Processes of Obstructive Sleep Apnea

    Authors: Phat K. Huynh, Arveity R. Setty, Trung Q. Le

    Abstract: Complex systems, such as pathophysiological processes, commonly exhibit chaotic, nonlinear, and intermittent phenomena. Koopman operator theory and Hankel alternative view of Koopman (HAVOK) model have been widely used to decompose the chaos of the complex system dynamics into an intermittent forced linear system. Although the statistics of the intermittent forcing have been proposed to characteri… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: 28 pages, 9 figures, 1 table

  5. arXiv:2111.05761  [pdf

    stat.AP eess.SP

    A Probabilistic Domain-knowledge Framework for Nosocomial Infection Risk Estimation of Communicable Viral Diseases in Healthcare Personnel: A Case Study for COVID-19

    Authors: Phat K. Huynh, Arveity R. Setty, Om P. Yadav, Trung Q. Le

    Abstract: Hospital-acquired infections of communicable viral diseases (CVDs) are posing a tremendous challenge to healthcare workers globally. Healthcare personnel (HCP) is facing a consistent risk of hospital-acquired infections, and subsequently higher rates of morbidity and mortality. We proposed a domain knowledge-driven infection risk model to quantify the individual HCP and the population-level health… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: 10 pages, 4 figures, Journal of Biomedical and Health Informatics

  6. arXiv:2109.13226  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yan** Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang , et al. (1 additional authors not shown)

    Abstract: We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled da… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography updated

  7. arXiv:2102.05610  [pdf, other

    cs.CV eess.IV

    Searching for Fast Model Families on Datacenter Accelerators

    Authors: Sheng Li, Mingxing Tan, Ruoming Pang, Andrew Li, Liqun Cheng, Quoc Le, Norman P. Jouppi

    Abstract: Neural Architecture Search (NAS), together with model scaling, has shown remarkable progress in designing high accuracy and fast convolutional architecture families. However, as neither NAS nor model scaling considers sufficient hardware architecture details, they do not take full advantage of the emerging datacenter (DC) accelerators. In this paper, we search for fast and accurate CNN model famil… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

  8. arXiv:2012.11736  [pdf, ps, other

    eess.SP

    Energy Efficiency Maximization in RIS-Aided Cell-Free Network with Limited Backhaul

    Authors: Quang Nhat Le, Van-Dinh Nguyen, Octavia A. Dobre, Ruiqin Zhao

    Abstract: Integrating the reconfigurable intelligent surface in a cell-free (RIS-CF) network is an effective solution to improve the capacity and coverage of future wireless systems with low cost and power consumption. The reflecting coefficients of RISs can be programmed to enhance signals received at users. This letter addresses a joint design of transmit beamformers at access points and reflecting coeffi… ▽ More

    Submitted 8 March, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: submitted for possible publication

  9. Full-Duplex Non-Orthogonal Multiple Access Cooperative Overlay Spectrum-Sharing Networks with SWIPT

    Authors: Quang Nhat Le, Animesh Yadav, Nam-Phong Nguyen, Octavia A. Dobre, Ruiqin Zhao

    Abstract: This paper proposes a novel non-orthogonal multiple access (NOMA) assisted cooperative spectrum sharing network, in which one of the full-duplex (FD) secondary transmitters (STs) is chosen among many for forwarding the primary transmitter's and its own information to primary receiver and secondary receivers, respectively, using NOMA technique. To stimulate the ST to conduct cooperative transmissio… ▽ More

    Submitted 19 November, 2020; originally announced November 2020.

    Comments: accepted for publication in the IEEE Transactions on Green Communications and Networking

  10. arXiv:2011.07549  [pdf, ps, other

    eess.SP

    Learning-Assisted User Clustering in Cell-Free Massive MIMO-NOMA Networks

    Authors: Quang Nhat Le, Van-Dinh Nguyen, Nam-Phong Nguyen, Symeon Chatzinotas, Octavia A. Dobre, Ruiqin Zhao

    Abstract: The superior spectral efficiency (SE) and user fairness feature of non-orthogonal multiple access (NOMA) systems are achieved by exploiting user clustering (UC) more efficiently. However, a random UC certainly results in a suboptimal solution while an exhaustive search method comes at the cost of high complexity, especially for systems of medium-to-large size. To address this problem, we develop t… ▽ More

    Submitted 15 November, 2020; originally announced November 2020.

    Comments: submitted for possible publication

  11. arXiv:2010.10504  [pdf, other

    eess.AS cs.LG cs.SD

    Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu

    Abstract: We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-e… ▽ More

    Submitted 20 July, 2022; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: 11 pages, 3 figures, 5 tables. Accepted to NeurIPS SAS 2020 Workshop; v2: minor errors corrected

  12. arXiv:2008.06828  [pdf, other

    cs.CV cs.LG eess.IV

    A novel approach to remove foreign objects from chest X-ray images

    Authors: Hieu X. Le, Phuong D. Nguyen, Thang H. Nguyen, Khanh N. Q. Le, Thanh T. Nguyen

    Abstract: We initially proposed a deep learning approach for foreign objects inpainting in smartphone-camera captured chest radiographs utilizing the cheXphoto dataset. Foreign objects which can significantly affect the quality of a computer-aided diagnostic prediction are captured under various settings. In this paper, we used multi-method to tackle both removal and inpainting chest radiographs. Firstly, a… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

    Comments: 9 pages, 7 figures, 7 tables

  13. Improved Noisy Student Training for Automatic Speech Recognition

    Authors: Daniel S. Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, Quoc V. Le

    Abstract: Recently, a semi-supervised learning method known as "noisy student training" has been shown to improve image classification performance of deep networks significantly. Noisy student training is an iterative self-training method that leverages augmentation to improve network performance. In this work, we adapt and improve noisy student training for automatic speech recognition, employing (adaptive… ▽ More

    Submitted 29 October, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: 5 pages, 5 figures, 4 tables; v2: minor revisions, reference added

    Journal ref: Proc. Interspeech 2020, 2817-2821

  14. arXiv:1912.05533  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    SpecAugment on Large Scale Datasets

    Authors: Daniel S. Park, Yu Zhang, Chung-Cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc V. Le, Yonghui Wu

    Abstract: Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets. In this paper, we demonstrate its effectiveness on tasks with large scale datasets by investigating its application to the Google Multidomain Dataset (Naraya… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

    Comments: 5 pages, 3 tables; submitted to ICASSP 2020

  15. arXiv:1912.05027  [pdf, other

    cs.CV cs.LG eess.IV

    SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

    Authors: Xianzhi Du, Tsung-Yi Lin, Pengchong **, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

    Abstract: Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a b… ▽ More

    Submitted 17 June, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: CVPR 2020

  16. arXiv:1911.09070  [pdf, other

    cs.CV cs.LG eess.IV

    EfficientDet: Scalable and Efficient Object Detection

    Authors: Mingxing Tan, Ruoming Pang, Quoc V. Le

    Abstract: Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multiscale feature fusion; Second, we propose a compound scal… ▽ More

    Submitted 27 July, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: CVPR 2020

    Journal ref: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

  17. arXiv:1910.04971  [pdf, other

    cs.RO eess.SY

    Autonomous Shuttles for Last-Mile Connectivity

    Authors: Garrison Neel, Amir Darwesh, Quang Le, Srikanth Saripalli

    Abstract: This paper describes an autonomous shuttle which targets providing last-mile transportation. Often, this involves operation in crowded areas with high levels of pedestrian traffic, and little to no lane markings or traffic control. We aim to create a functional shuttle to be improved upon in the future as new robust solutions are developed to replace the current components. An initial implementati… ▽ More

    Submitted 11 October, 2019; originally announced October 2019.

  18. arXiv:1906.02940  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Selfie: Self-supervised Pretraining for Image Embedding

    Authors: Trieu H. Trinh, Minh-Thang Luong, Quoc V. Le

    Abstract: We introduce a pretraining technique called Selfie, which stands for SELFie supervised Image Embedding. Selfie generalizes the concept of masked language modeling of BERT (Devlin et al., 2019) to continuous data, such as images, by making use of the Contrastive Predictive Coding loss (Oord et al., 2018). Given masked-out patches in an input image, our method learns to select the correct patch, amo… ▽ More

    Submitted 27 July, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

  19. arXiv:1904.08779  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

    Authors: Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le

    Abstract: We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of war** the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech… ▽ More

    Submitted 3 December, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: 5 pages, 3 figures, 6 tables; v3: references added

    Journal ref: Proc. Interspeech 2019, 2613-2617