Skip to main content

Showing 1–35 of 35 results for author: Park, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  2. arXiv:2308.07788  [pdf, ps, other

    eess.AS

    GIST-AiTeR Speaker Diarization System for VoxCeleb Speaker Recognition Challenge (VoxSRC) 2023

    Authors: Dongkeon Park, Ji Won Kim, Kang Ryeol Kim, Do Hyun Lee, Hong Kook Kim

    Abstract: This report describes the submission system by the GIST-AiTeR team for the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23) Track 4. Our submission system focuses on implementing diverse speaker diarization (SD) techniques, including ResNet293 and MFA-Conformer with different combinations of segment and hop length. Then, those models are combined into an ensemble model. The ResNet293 and MF… ▽ More

    Submitted 25 August, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: VoxSRC 2023 Track4

  3. arXiv:2307.10667  [pdf, other

    eess.IV cs.CV

    Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors

    Authors: Haechang Lee, Dongwon Park, Wongi Jeong, Kijeong Kim, Hyunwoo Je, Dongil Ryu, Se Young Chun

    Abstract: As the physical size of recent CMOS image sensors (CIS) gets smaller, the latest mobile cameras are adopting unique non-Bayer color filter array (CFA) patterns (e.g., Quad, Nona, QxQ), which consist of homogeneous color units with adjacent pixels. These non-Bayer sensors are superior to conventional Bayer CFA thanks to their changeable pixel-bin sizes for different light conditions but may introdu… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  4. arXiv:2306.08133  [pdf, ps, other

    eess.AS cs.CL

    Large-scale Language Model Rescoring on Long-form Data

    Authors: Tongzhou Chen, Cyril Allauzen, Yinghui Huang, Daniel Park, David Rybach, W. Ronny Huang, Rodrigo Cabrera, Kartik Audhkhasi, Bhuvana Ramabhadran, Pedro J. Moreno, Michael Riley

    Abstract: In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30\% relative on Salient Term Error Rate (STER)… ▽ More

    Submitted 5 September, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 5 pages, accepted in ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  5. arXiv:2303.01037  [pdf, other

    cs.CL cs.SD eess.AS

    Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

    Authors: Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk , et al. (2 additional authors not shown)

    Abstract: We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quant… ▽ More

    Submitted 24 September, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 20 pages, 7 figures, 8 tables

  6. arXiv:2302.03917  [pdf, other

    cs.SD cs.LG eess.AS

    Noise2Music: Text-conditioned Music Generation with Diffusion Models

    Authors: Qingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V. Le, William Chan, Zhifeng Chen, Wei Han

    Abstract: We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts. Two types of diffusion models, a generator model, which generates an intermediate representation conditioned on text, and a cascader model, which generates high-fidelity audio conditioned on the intermediate representation and possibly the text, are trained and… ▽ More

    Submitted 6 March, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 15 pages

  7. arXiv:2212.05936  [pdf

    cs.CV eess.IV

    Encoder-Decoder Network with Guided Transmission Map: Architecture

    Authors: Le-Anh Tran, Dong-Chul Park

    Abstract: An insight into the architecture of the Encoder-Decoder Network with Guided Transmission Map (EDN-GTM), a novel and effective single image dehazing scheme, is presented in this paper. The EDN-GTM takes a conventional RGB hazy image in conjunction with the corresponding transmission map estimated by the dark channel prior (DCP) approach as inputs of the network. The EDN-GTM adopts an enhanced struc… ▽ More

    Submitted 31 March, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: 3 pages, 2 figures, ASPAI 2022

  8. arXiv:2211.05910  [pdf, other

    eess.IV cs.CV

    Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

    Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

  9. arXiv:2211.04470  [pdf, other

    cs.CV eess.IV

    Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report

    Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Lukasz Treszczotko, Xin Chang, Piotr Ksiazek, Michal Lopuszynski, Maciej Pioro, Rafal Rudnicki, Maciej Smyl, Yujie Ma, Zhenyu Li, Zehui Chen, Jialei Xu, Xianming Liu, Junjun Jiang, XueChao Shi, Difan Xu, Yanan Li, Xiaotao Wang, Lei Lei, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo , et al. (14 additional authors not shown)

    Abstract: Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth es… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2105.08630, arXiv:2211.03885; text overlap with arXiv:2105.08819, arXiv:2105.08826, arXiv:2105.08629, arXiv:2105.07809, arXiv:2105.07825

  10. arXiv:2210.10879  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

    Authors: Gary Wang, Ekin D. Cubuk, Andrew Rosenberg, Shuyang Cheng, Ron J. Weiss, Bhuvana Ramabhadran, Pedro J. Moreno, Quoc V. Le, Daniel S. Park

    Abstract: Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more "end-to-end", the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present Graph-Augment, a technique to define the augmentation space as… ▽ More

    Submitted 24 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: 6 pages, accepted at SLT 2022. Updated with copyright

  11. arXiv:2209.10357  [pdf, other

    eess.AS

    GIST-AiTeR System for the Diarization Task of the 2022 VoxCeleb Speaker Recognition Challenge

    Authors: Dongkeon Park, Yechan Yu, Kyeong Wan Park, Ji Won Kim, Hong Kook Kim

    Abstract: This report describes the submission system of the GIST-AiTeR team at the 2022 VoxCeleb Speaker Recognition Challenge (VoxSRC) Track 4. Our system mainly includes speech enhancement, voice activity detection , multi-scaled speaker embedding, probabilistic linear discriminant analysis-based speaker clustering, and overlapped speech detection models. We first construct four different diarization sys… ▽ More

    Submitted 6 October, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: 2022 VoxSRC Track4

  12. arXiv:2209.09217  [pdf, other

    cs.RO eess.SY

    WiForceSticker: Batteryless, Thin Sticker-like Flexible Force Sensor

    Authors: Agrim Gupta, Daegue Park, Shayaun Bashar, Cedric Girerd, Tania Morimoto, Dinesh Bharadia

    Abstract: Any two objects in contact with each other exert a force that could be simply due to gravity or mechanical contact, such as a robotic arm grip** an object or even the contact between two bones at our knee joints. The ability to naturally measure and monitor these contact forces allows a plethora of applications from warehouse management (detect faulty packages based on weights) to robotics (maki… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

  13. arXiv:2208.07552  [pdf

    eess.IV cs.CV cs.LG

    Coil2Coil: Self-supervised MR image denoising using phased-array coil images

    Authors: Juhyung Park, Dongwon Park, Hyeong-Geol Shin, Eun-Jung Choi, Hongjun An, Minjun Kim, Dongmyung Shin, Se Young Chun, Jongho Lee

    Abstract: Denoising of magnetic resonance images is beneficial in improving the quality of low signal-to-noise ratio images. Recently, denoising using deep neural networks has demonstrated promising results. Most of these networks, however, utilize supervised learning, which requires large training images of noise-corrupted and clean image pairs. Obtaining training images, particularly clean images, is expe… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: 9 pages, 5figures

  14. Approximate Extraction of Late-Time Returns via Morphological Component Analysis

    Authors: Geoff Goehle, Benjamin Cowen, Thomas E. Blanford, J. Daniel Park, Daniel C. Brown

    Abstract: A fundamental challenge in acoustic data processing is to separate a measured time series into relevant phenomenological components. A given measurement is typically assumed to be an additive mixture of myriad signals plus noise whose separation forms an ill-posed inverse problem. In the setting of sensing elastic objects using active sonar, we wish to separate the early-time returns (e.g., return… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

    Comments: 18 pages, 17 figures

  15. arXiv:2205.04821  [pdf, other

    eess.IV cs.CV

    Self-supervised regression learning using domain knowledge: Applications to improving self-supervised denoising in imaging

    Authors: Il Yong Chun, Dongwon Park, Xuehang Zheng, Se Young Chun, Yong Long

    Abstract: Regression that predicts continuous quantity is a central part of applications using computational imaging and computer vision technologies. Yet, studying and understanding self-supervised learning for regression tasks - except for a particular regression task, image denoising - have lagged behind. This paper proposes a general self-supervised regression learning (SSRL) framework that enables lear… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

    Comments: 17 pages, 16 figures, 2 tables, submitted to IEEE T-IP

  16. arXiv:2204.11669  [pdf

    eess.IV cs.AI physics.med-ph

    Deep-learning-enabled Brain Hemodynamic Map** Using Resting-state fMRI

    Authors: Xirui Hou, Pengfei Guo, Puyang Wang, Peiying Liu, Doris D. M. Lin, Hongli Fan, Yang Li, Zhiliang Wei, Zixuan Lin, Dengrong Jiang, ** **, Catherine Kelly, Jay J. Pillai, Judy Huang, Marco C. Pinho, Binu P. Thomas, Babu G. Welch, Denise C. Park, Vishal M. Patel, Argye E. Hillis, Hanzhang Lu

    Abstract: Cerebrovascular disease is a leading cause of death globally. Prevention and early intervention are known to be the most effective forms of its management. Non-invasive imaging methods hold great promises for early stratification, but at present lack the sensitivity for personalized prognosis. Resting-state functional magnetic resonance imaging (rs-fMRI), a powerful tool previously used for mappin… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Journal ref: npj Digital Medicine (2023) 116

  17. arXiv:2204.08418  [pdf, ps, other

    eess.SP

    Enveloped Sinusoid Parseval Frames

    Authors: Geoff Goehle, Benjamin Cowen, J. Daniel Park, Daniel C. Brown

    Abstract: This paper presents a method of constructing Parseval frames from any collection of complex envelopes. The resulting Enveloped Sinusoid Parseval (ESP) frames can represent a wide variety of signal types as specified by their physical morphology. Since the ESP frame retains its Parseval property even when generated from a variety of envelopes, it is compatible with large scale and iterative optimiz… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

  18. arXiv:2112.12296  [pdf, other

    cs.IT eess.SP

    Sub-Chain Beam for mmWave Devices: A Trade-off between Power Saving and Beam Correspondence

    Authors: Jianhua Mo, Daehee Park, Boon Loong Ng, Vutha Va, Anum Ali, Chonghwa Seo, Jianzhong Charlie Zhang

    Abstract: Beam correspondence, or downlink-uplink (DL-UL) beam reciprocity, refers to the assumption that the best beams in the DL are also the best beams in the UL. This is an important assumption that allows the existing beam management framework in 5G to rely heavily on DL beam swee** and avoid UL beam swee**: UL beams are inferred from the measurements of the DL reference signals. Beam correspondenc… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: 6 pages, 7 figures, accepted by Asilomar conference 2021

  19. arXiv:2111.09051  [pdf

    eess.SY

    Implementation of Noise-Shaped Signaling System through Software-Defined Radio

    Authors: Junsung Choi, Dongryul Park, Suil Kim, Seungyoung Ahn

    Abstract: As developments of electromagnetic weapons, Electronic Warfare (EW) has been rising as the future form of war. Especially in wireless communications, the high security defense systems, such as Low Probability of Detection (LPD), Low Probability of Interception (LPI), or Low Prob-ability of Exploitation (LPE) communication algorithms, are studied to prevent the military force loss. One of the LPD,… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

  20. arXiv:2110.07116  [pdf, other

    eess.AS cs.SD

    Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization

    Authors: Yechan Yu, Dongkeon Park, Hong Kook Kim

    Abstract: End-to-end neural diarization (EEND) with self-attention directly predicts speaker labels from inputs and enables the handling of overlapped speech. Although the EEND outperforms clustering-based speaker diarization (SD), it cannot be further improved by simply increasing the number of encoder blocks because the last encoder block is dominantly supervised compared with lower blocks. This paper pro… ▽ More

    Submitted 26 September, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022, equal contribution from first two authors

  21. Universal Paralinguistic Speech Representations Using Self-Supervised Conformers

    Authors: Joel Shor, Aren Jansen, Wei Han, Daniel Park, Yu Zhang

    Abstract: Many speech applications require understanding aspects beyond the words being spoken, such as recognizing emotion, detecting whether the speaker is wearing a mask, or distinguishing real from synthetic speech. In this work, we introduce a new state-of-the-art paralinguistic representation derived from large-scale, fully self-supervised training of a 600M+ parameter Conformer-based architecture. We… ▽ More

    Submitted 13 December, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

    Journal ref: ICASSP 2022-2022 IEEE

  22. arXiv:2109.13226  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yan** Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang , et al. (1 additional authors not shown)

    Abstract: We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled da… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography updated

  23. arXiv:2103.12789  [pdf, other

    physics.optics eess.IV

    Single pixel structured imaging through fog

    Authors: Mark Bashkansky, Samuel D. Park, John Reintjes

    Abstract: We describe the application of structured imaging with a single pixel camera to imaging through fog. We demonstrate the use of a high-pass filter on the detected bucket signals to suppress the effects of temporal variations of fog density and enable an effective reconstruction of the image. A quantitative analysis and comparison of several high-pass filters are demonstrated for the application. Bo… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

  24. arXiv:2011.06110  [pdf, other

    eess.AS cs.SD

    Efficient Knowledge Distillation for RNN-Transducer Models

    Authors: Sankaran Panchapagesan, Daniel S. Park, Chung-Cheng Chiu, Yuan Shangguan, Qiao Liang, Alexander Gruenstein

    Abstract: Knowledge Distillation is an effective method of transferring knowledge from a large model to a smaller model. Distillation can be viewed as a type of model compression, and has played an important role for on-device ASR applications. In this paper, we develop a distillation method for RNN-Transducer (RNN-T) models, a popular end-to-end neural network architecture for streaming speech recognition.… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: 5 pages, 1 figure, 2 tables; submitted to ICASSP 2021

  25. arXiv:2010.10504  [pdf, other

    eess.AS cs.LG cs.SD

    Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu

    Abstract: We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-e… ▽ More

    Submitted 20 July, 2022; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: 11 pages, 3 figures, 5 tables. Accepted to NeurIPS SAS 2020 Workshop; v2: minor errors corrected

  26. arXiv:2007.01950  [pdf

    physics.med-ph eess.IV

    Ultra-high spatial resolution BOLD fMRI in humans using combined segmented-accelerated VFA-FLEET with a recursive RF pulse design

    Authors: Avery J. L. Berman, William A. Grissom, Thomas Witzel, Shahin Nasr, Daniel J. Park, Kawin Setsompop, Jonathan R. Polimeni

    Abstract: Purpose To alleviate the spatial encoding limitations of single-shot EPI by develo** multi-shot segmented EPI for ultra-high-resolution fMRI with reduced ghosting artifacts from subject motion and respiration. Methods Segmented EPI can reduce readout duration and reduce acceleration factors, however, the time elapsed between segment acquisitions (on the order of seconds) can result in inte… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

    Comments: 51 pages (including supplement), 8 main figures, 6 supporting figures. For supporting videos (8), please visit https://github.com/aveberman/vfa-fleet. Note: this work has been accepted for publication at Magnetic Resonance in Medicine

  27. Improved Noisy Student Training for Automatic Speech Recognition

    Authors: Daniel S. Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, Quoc V. Le

    Abstract: Recently, a semi-supervised learning method known as "noisy student training" has been shown to improve image classification performance of deep networks significantly. Noisy student training is an iterative self-training method that leverages augmentation to improve network performance. In this work, we adapt and improve noisy student training for automatic speech recognition, employing (adaptive… ▽ More

    Submitted 29 October, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: 5 pages, 5 figures, 4 tables; v2: minor revisions, reference added

    Journal ref: Proc. Interspeech 2020, 2817-2821

  28. arXiv:2002.09847  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Unsupervised Denoising for Satellite Imagery using Wavelet Subband CycleGAN

    Authors: Joonyoung Song, Jae-Heon Jeong, Dae-Soon Park, Hyun-Ho Kim, Doo-Chun Seo, Jong Chul Ye

    Abstract: Multi-spectral satellite imaging sensors acquire various spectral band images such as red (R), green (G), blue (B), near-infrared (N), etc. Thanks to the unique spectroscopic property of each spectral band with respective to the objects on the ground, multi-spectral satellite imagery can be used for various geological survey applications. Unfortunately, image artifacts from imaging sensor noises o… ▽ More

    Submitted 23 February, 2020; originally announced February 2020.

  29. arXiv:1912.05533  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    SpecAugment on Large Scale Datasets

    Authors: Daniel S. Park, Yu Zhang, Chung-Cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc V. Le, Yonghui Wu

    Abstract: Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets. In this paper, we demonstrate its effectiveness on tasks with large scale datasets by investigating its application to the Google Multidomain Dataset (Naraya… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

    Comments: 5 pages, 3 tables; submitted to ICASSP 2020

  30. arXiv:1911.07410  [pdf, other

    eess.IV cs.CV

    Multi-Temporal Recurrent Neural Networks For Progressive Non-Uniform Single Image Deblurring With Incremental Temporal Training

    Authors: Dongwon Park, Dong Un Kang, Jisoo Kim, Se Young Chun

    Abstract: Multi-scale (MS) approaches have been widely investigated for blind single image / video deblurring that sequentially recovers deblurred images in low spatial scale first and then in high spatial scale later with the output of lower scales. MS approaches have been effective especially for severe blurs induced by large motions in high spatial scale since those can be seen as small blurs in low spat… ▽ More

    Submitted 17 November, 2019; originally announced November 2019.

    Comments: 10 pages, 8 figures, 6 tables, work in progress

  31. arXiv:1910.14211  [pdf

    physics.med-ph eess.IV

    Accelerated spin-echo fMRI using Multisection Excitation by Simultaneous Spin-echo Interleaving (MESSI) with complex-encoded generalized SLIce Dithered Enhanced Resolution (cgSlider) Simultaneous Multi-Slice Echo-Planar Imaging

    Authors: SoHyun Han, Congyu Liao, Mary Kate Manhard, Daniel Joseph Park, Berkin Bilgic, Merlin J. Fair, Fuyixue Wang, Anna I. Blazejewska, William A. Grissom, Jonathan R. Polimeni, Kawin Setsompop

    Abstract: Spin-echo functional MRI (SE-fMRI) has the potential to improve spatial specificity when compared to gradient-echo fMRI. However, high spatiotemporal resolution SE-fMRI with large slice-coverage is challenging as SE-fMRI requires a long echo time (TE) to generate blood oxygenation level-dependent (BOLD) contrast, leading to long repetition times (TR). The aim of this work is to develop an acquisit… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: 38 pages, 9 figures, ISMRM2019 #1165

  32. arXiv:1909.11915  [pdf

    cs.CV cs.LG eess.IV

    Unsupervised Image Translation using Adversarial Networks for Improved Plant Disease Recognition

    Authors: Haseeb Nazki, Sook Yoon, Alvaro Fuentes, Dong Sun Park

    Abstract: Acquisition of data in task-specific applications of machine learning like plant disease recognition is a costly endeavor owing to the requirements of professional human diligence and time constraints. In this paper, we present a simple pipeline that uses GANs in an unsupervised image translation environment to improve learning with respect to the data distribution in a plant disease dataset, redu… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: 20 pages, 11 figures, 3 tables, article under review

  33. arXiv:1907.06834  [pdf, other

    eess.SP

    Noise Removal of FTIR Hyperspectral Images via MMSE

    Authors: Chang Sik Lee, Hyeong Geun Yu, Dong Jo Park, Dong Eui Chang, Hyunwoo Nam, Byeong Hwang Park

    Abstract: Fourier transform infrared (FTIR) hyperspectral imaging systems are deployed in various fields where spectral information is exploited. Chemical warfare agent (CWA) detection is one of such fields and it requires a fast and accurate process from the measurement to the visualization of detection results, including noise removal. A general concern of existing noise removal algorithms is a trade-off… ▽ More

    Submitted 29 December, 2019; v1 submitted 16 July, 2019; originally announced July 2019.

  34. arXiv:1904.08779  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

    Authors: Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le

    Abstract: We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of war** the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech… ▽ More

    Submitted 3 December, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: 5 pages, 3 figures, 6 tables; v3: references added

    Journal ref: Proc. Interspeech 2019, 2613-2617

  35. arXiv:1902.06562  [pdf, other

    cs.LG eess.SP stat.ML

    Intra- and Inter-epoch Temporal Context Network (IITNet) Using Sub-epoch Features for Automatic Sleep Scoring on Raw Single-channel EEG

    Authors: Hogeon Seo, Seunghyeok Back, Seongju Lee, Deokhwan Park, Tae Kim, Kyoobin Lee

    Abstract: A deep learning model, named IITNet, is proposed to learn intra- and inter-epoch temporal contexts from raw single-channel EEG for automatic sleep scoring. To classify the sleep stage from half-minute EEG, called an epoch, sleep experts investigate sleep-related events and consider the transition rules between the found events. Similarly, IITNet extracts representative features at a sub-epoch leve… ▽ More

    Submitted 10 June, 2020; v1 submitted 18 February, 2019; originally announced February 2019.

    Comments: First three authors contributed equally to this work; Accepted manuscript for Biomedical Signal Processing and Control (BSPC); 12 pages, 6 figures;