Skip to main content

Showing 1–50 of 52 results for author: Sim

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.19611  [pdf, other

    eess.SP cs.ET cs.IT cs.NI

    Radio Resource Management Design for RSMA: Optimization of Beamforming, User Admission, and Discrete/Continuous Rates with Imperfect SIC

    Authors: L. F. Abanto-Leon, A. Krishnamoorthy, A. Garcia-Saavedra, G. H. Sim, R. Schober, M. Hollick

    Abstract: This paper investigates the radio resource management (RRM) design for multiuser rate-splitting multiple access (RSMA), accounting for various characteristics of practical wireless systems, such as the use of discrete rates, the inability to serve all users, and the imperfect successive interference cancellation (SIC). Specifically, failure to consider these characteristics in RRM design may lead… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  2. arXiv:2403.19709  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.NE

    Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models

    Authors: Tsendsuren Munkhdalai, Youzheng Chen, Khe Chai Sim, Fadi Biadsy, Tara Sainath, Pedro Moreno Mengibar

    Abstract: Parameter efficient adaptation methods have become a key mechanism to train large pre-trained models for downstream tasks. However, their per-task parameter overhead is considered still high when the number of downstream tasks to adapt for is large. We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario. Our adapter is hierarchical in terms of how… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 figures, 5 tables

  3. arXiv:2401.15313  [pdf, other

    cs.RO cs.CV eess.SY math.OC

    Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization

    Authors: Kihoon Shin, Hyunjae Sim, Seungwon Nam, Yonghee Kim, Jae Hu, Kwang-Ki K. Kim

    Abstract: In this study, we address multi-robot localization issues, with a specific focus on cooperative localization and observability analysis of relative pose estimation. Cooperative localization involves enhancing each robot's information through a communication network and message passing. If odometry data from a target robot can be transmitted to the ego robot, observability of their relative pose es… ▽ More

    Submitted 4 February, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

    Comments: 20 pages, 21 figures

    MSC Class: 93C85; 93E11; 93E24; 90C26; 93E10; 62M20;

  4. arXiv:2310.17954  [pdf, other

    eess.IV cs.CV

    Multivessel Coronary Artery Segmentation and Stenosis Localisation using Ensemble Learning

    Authors: Muhammad Bilal, Dinis Martinho, Reiner Sim, Adnan Qayyum, Hunaid Vohra, Massimo Caputo, Taofeek Akinosho, Sofiat Abioye, Zaheer Khan, Waleed Niaz, Junaid Qadir

    Abstract: Coronary angiography analysis is a common clinical task performed by cardiologists to diagnose coronary artery disease (CAD) through an assessment of atherosclerotic plaque's accumulation. This study introduces an end-to-end machine learning solution developed as part of our solution for the MICCAI 2023 Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs (ARCA… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Submission report for ARCADE challenge hosted at MICCAI2023

  5. arXiv:2310.00178  [pdf, other

    cs.CL eess.AS

    Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm

    Authors: Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara Sainath, Pedro Moreno Mengibar

    Abstract: Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios. We propose algorithms for contextual biasing based on the Knuth-Morris-Pratt algorithm for pattern matching. During beam search, we boost the score of a token extension if it extends matching into a set of biasing… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  6. arXiv:2309.12963  [pdf, ps, other

    eess.AS cs.SD

    Massive End-to-end Models for Short Search Queries

    Authors: Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar

    Abstract: In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters. The encoders of our models use the neural architecture of Google's universal speech model (USM), with additional funnel pooling layers to signifi… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  7. arXiv:2309.09996  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Improving Speech Recognition for African American English With Audio Classification

    Authors: Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara Sainath, Françoise Beaufays, Pedro Moreno Mengibar

    Abstract: Automatic speech recognition (ASR) systems have been shown to have large quality disparities between the language varieties they are intended or expected to recognize. One way to mitigate this is to train or fine-tune models with more representative datasets. But this approach can be hindered by limited in-domain data for training and evaluation. We propose a new way to improve the robustness of a… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  8. arXiv:2306.01789  [pdf, other

    cs.SD cs.CL eess.AS

    Edit Distance based RL for RNNT decoding

    Authors: Dongseong Hwang, Changwan Ryu, Khe Chai Sim

    Abstract: RNN-T is currently considered the industry standard in ASR due to its exceptional WERs in various benchmark tests and its ability to support seamless streaming and longform transcription. However, its biggest drawback lies in the significant discrepancy between its training and inference objectives. During training, RNN-T maximizes all alignment probabilities by teacher forcing, while during infer… ▽ More

    Submitted 14 July, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures

  9. arXiv:2305.13108  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test

    Authors: Eungbeom Kim, Yunkee Chae, Jaeheon Sim, Kyogu Lee

    Abstract: Automatic speech recognition systems based on deep learning are mainly trained under empirical risk minimization (ERM). Since ERM utilizes the averaged performance on the data samples regardless of a group such as healthy or dysarthric speakers, ASR systems are unaware of the performance disparities across the groups. This results in biased ASR systems whose performance differences among groups ar… ▽ More

    Submitted 27 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023

  10. arXiv:2302.01496  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Efficient Domain Adaptation for Speech Foundation Models

    Authors: Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays

    Abstract: Foundation models (FMs), that are trained on broad data at scale and are adaptable to a wide range of downstream tasks, have brought large interest in the research community. Benefiting from the diverse data sources such as different modalities, languages and application domains, foundation models have demonstrated strong generalization and knowledge transfer capabilities. In this paper, we presen… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  11. arXiv:2211.16653  [pdf

    cs.LG cs.AI eess.SP

    CRU: A Novel Neural Architecture for Improving the Predictive Performance of Time-Series Data

    Authors: Sunghyun Sim, Dohee Kim, Hyerim Bae

    Abstract: The time-series forecasting (TSF) problem is a traditional problem in the field of artificial intelligence. Models such as Recurrent Neural Network (RNN), Long Short Term Memory (LSTM), and GRU (Gate Recurrent Units) have contributed to improving the predictive accuracy of TSF. Furthermore, model structures have been proposed to combine time-series decomposition methods, such as seasonal-trend dec… ▽ More

    Submitted 6 February, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

  12. arXiv:2211.11557  [pdf

    eess.IV cs.CV cs.LG

    Decomposing 3D Neuroimaging into 2+1D Processing for Schizophrenia Recognition

    Authors: Mengjiao Hu, Xudong Jiang, Kang Sim, Juan Helen Zhou, Cuntai Guan

    Abstract: Deep learning has been successfully applied to recognizing both natural images and medical images. However, there remains a gap in recognizing 3D neuroimaging data, especially for psychiatric diseases such as schizophrenia and depression that have no visible alteration in specific slices. In this study, we propose to process the 3D data by a 2+1D framework so that we can exploit the powerful deep… ▽ More

    Submitted 21 November, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

  13. arXiv:2211.02712  [pdf, other

    cs.LG cs.SD eess.AS

    Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion

    Authors: Zhouyuan Huo, Khe Chai Sim, Bo Li, Dongseong Hwang, Tara N. Sainath, Trevor Strohman

    Abstract: Self-supervised pre-training of a speech foundation model, followed by supervised fine-tuning, has shown impressive quality improvements on automatic speech recognition (ASR) tasks. Fine-tuning separate foundation models for many downstream tasks are expensive since the foundation model is usually very big. Parameter-efficient fine-tuning methods (e.g. adapter, sparse update methods) offer an alte… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

  14. arXiv:2210.17143  [pdf, other

    cs.SD cs.CL eess.AS

    Exploring Train and Test-Time Augmentations for Audio-Language Learning

    Authors: Eungbeom Kim, **hee Kim, Yoori Oh, Kyungsu Kim, Minju Park, Jaeheon Sim, **woo Lee, Kyogu Lee

    Abstract: In this paper, we aim to unveil the impact of data augmentation in audio-language multi-modal learning, which has not been explored despite its importance. We explore various augmentation methods at not only train-time but also test-time and find out that proper data augmentation can lead to substantial improvements. Specifically, applying our proposed audio-language paired augmentation PairMix, w… ▽ More

    Submitted 23 May, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: 5 pages, 4 figures

  15. arXiv:2210.12216  [pdf, other

    cs.LG cs.AI eess.SP

    Feature Engineering and Classification Models for Partial Discharge in Power Transformers

    Authors: Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo

    Abstract: To ensure reliability, power transformers are monitored for partial discharge (PD) events, which are symptoms of transformer failure. Since failures can have catastrophic cascading consequences, it is critical to preempt them as early as possible. Our goal is to classify PDs as corona, floating, particle, or void, to gain an understanding of the failure location. Using phase resolved PD signal dat… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

  16. Enemy Spotted: in-game gun sound dataset for gunshot classification and localization

    Authors: Junwoo Park, Youngwoo Cho, Gyuhyeon Sim, Hojoon Lee, Jaegul Choo

    Abstract: Recently, deep learning-based methods have drawn huge attention due to their simple yet high performance without domain knowledge in sound classification and localization tasks. However, a lack of gun sounds in existing datasets has been a major obstacle to implementing a support system to spot criminals from their gunshots by leveraging deep learning models. Since the occurrence of gunshot is rar… ▽ More

    Submitted 16 February, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted at IEEE Conference on Games (GoG) 2022

  17. arXiv:2210.05793  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR

    Authors: Dongseong Hwang, Khe Chai Sim, Yu Zhang, Trevor Strohman

    Abstract: Knowledge distillation is an effective machine learning technique to transfer knowledge from a teacher model to a smaller student model, especially with unlabeled data. In this paper, we focus on knowledge distillation for the RNN-T model, which is widely used in state-of-the-art (SoTA) automatic speech recognition (ASR). Specifically, we compared using soft and hard target distillation to train l… ▽ More

    Submitted 28 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: 8 pages, 2 figures

  18. arXiv:2208.03067  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

    Authors: Sandy Ritchie, You-Chi Cheng, Mingqing Chen, Rajiv Mathews, Daan van Esch, Bo Li, Khe Chai Sim

    Abstract: Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages. We have experimented with two techniques which may provide pathways to large vocabulary speech recognition for African languages: multilingual modeling and self-supervised learning. We gathered available open source data… ▽ More

    Submitted 4 October, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  19. arXiv:2207.00706  [pdf, other

    eess.AS cs.CL cs.LG

    UserLibri: A Dataset for ASR Personalization Using Only Text

    Authors: Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey

    Abstract: Personalization of speech models on mobile devices (on-device personalization) is an active area of research, but more often than not, mobile devices have more text-only data than paired audio-text data. We explore training a personalized language model on text-only data, used during inference to improve speech recognition performance for that user. We experiment on a user-clustered LibriSpeech co… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted for publication in Interspeech 2022. 9 total pages with appendix, 9 total tables, 5 total figures

  20. arXiv:2206.10284  [pdf, other

    eess.SP

    Analog Self-Interference Cancellation with Practical RF Components for Full-Duplex Radios

    Authors: Jong Woo Kwak, Min Soo Sim, In-Woong Kang, Jaedon Park, Kai-Kit Wong, Chan-Byoung Chae

    Abstract: One of the main obstacles in full-duplex radios is analog-to-digital converter (ADC) saturation on a receiver due to the strong self-interference (SI). To solve this issue, researchers have proposed two different types of analog self-interference cancellation (SIC) methods -- i) passive suppression and ii) regeneration-and-subtraction of SI. For the latter case, the tunable RF component, such as a… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

  21. Myocardial Segmentation of Late Gadolinium Enhanced MR Images by Propagation of Contours from Cine MR Images

    Authors: Dong Wei, Ying Sun, ** Chai, Adrian Low, Sim Heng Ong

    Abstract: Automatic segmentation of myocardium in Late Gadolinium Enhanced (LGE) Cardiac MR (CMR) images is often difficult due to the intensity heterogeneity resulting from accumulation of contrast agent in infarcted areas. In this paper, we propose an automatic segmentation framework that fully utilizes shared information between corresponding cine and LGE images of a same patient. Given myocardial contou… ▽ More

    Submitted 21 May, 2022; originally announced May 2022.

    Comments: MICCAI 2011

  22. arXiv:2205.09703  [pdf, other

    cs.LG cs.DC cs.PF eess.SY stat.AP

    Extract Dynamic Information To Improve Time Series Modeling: a Case Study with Scientific Workflow

    Authors: Jeeyung Kim, Mengtian **, Youkow Homma, Alex Sim, Wilko Kroeger, Kesheng Wu

    Abstract: In modeling time series data, we often need to augment the existing data records to increase the modeling accuracy. In this work, we describe a number of techniques to extract dynamic information about the current state of a large scientific workflow, which could be generalized to other types of applications. The specific task to be modeled is the time needed for transferring a file from an experi… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  23. arXiv:2205.05598  [pdf, other

    cs.DC cs.NI eess.SY

    Studying Scientific Data Lifecycle in On-demand Distributed Storage Caches

    Authors: Julian Bellavita, Alex Sim, Kesheng Wu, Inder Monga, Chin Guok, Frank Würthwein, Diego Davila

    Abstract: The XRootD system is used to transfer, store, and cache large datasets from high-energy physics (HEP). In this study we focus on its capability as distributed on-demand storage cache. Through exploring a large set of daily log files between 2020 and 2021, we seek to understand the data access patterns that might inform future cache design. Our study begins with a set of summary statistics regardin… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  24. arXiv:2201.10426  [pdf, ps, other

    cs.IT cs.NI eess.SP

    Sequential Parametric Optimization for Rate-Splitting Precoding in Non-Orthogonal Unicast and Multicast Transmissions

    Authors: Luis F. Abanto-Leon, Matthias Hollick, Bruno Clerckx, Gek Hong Sim

    Abstract: This paper investigates rate-splitting (RS) precoding for non-orthogonal unicast and multicast (NOUM) transmissions using fully-digital and hybrid precoders. We study the nonconvex weighted sum-rate (WSR) maximization problem subject to a multicast requirement. We propose FALCON, an approach based on sequential parametric optimization, to solve the aforementioned problem. We show that FALCON conve… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: 7 pages / ICC 2022

  25. arXiv:2201.10297  [pdf, ps, other

    eess.SP cs.IT cs.NI

    RadiOrchestra: Proactive Management of Millimeter-wave Self-backhauled Small Cells via Joint Optimization of Beamforming, User Association, Rate Selection, and Admission Control

    Authors: L. F. Abanto-Leon, A. Asadi, G. H. Sim, A. Garcia-Saavedra, M. Hollick

    Abstract: Millimeter-wave self-backhauled small cells are a key component of next-generation wireless networks. Their dense deployment will increase data rates, reduce latency, and enable efficient data transport between the access and backhaul networks, providing greater flexibility not previously possible with optical fiber. Despite their high potential, operating dense self-backhauled networks optimally… ▽ More

    Submitted 13 July, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: 19 pages

    Journal ref: IEEE Transactions on Wireless Communications, 2022

  26. arXiv:2112.05146  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction

    Authors: Hyung** Chung, Byeongsu Sim, Jong Chul Ye

    Abstract: Diffusion models have recently attained significant interest within the community owing to their strong performance as generative models. Furthermore, its application to inverse problems have demonstrated state-of-the-art performance. Unfortunately, diffusion models have a critical downside - they are inherently slow to sample from, needing few thousand steps of iteration to generate images from p… ▽ More

    Submitted 19 March, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Accepted to CVPR 2022

  27. arXiv:2111.08137  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Joint Unsupervised and Supervised Training for Multilingual ASR

    Authors: Junwen Bai, Bo Li, Yu Zhang, Ankur Bapna, Nikhil Siddhartha, Khe Chai Sim, Tara N. Sainath

    Abstract: Self-supervised training has shown promising gains in pretraining models and facilitating the downstream finetuning for speech recognition, like multilingual ASR. Most existing methods adopt a 2-stage scheme where the self-supervised loss is optimized in the first pretraining stage, and the standard supervised finetuning resumes in the second stage. In this paper, we propose an end-to-end (E2E) Jo… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  28. arXiv:2110.02220  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.NE

    Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

    Authors: Tsendsuren Munkhdalai, Khe Chai Sim, Angad Chandorkar, Fan Gao, Mason Chua, Trevor Strohman, Françoise Beaufays

    Abstract: Fast contextual adaptation has shown to be effective in improving Automatic Speech Recognition (ASR) of rare words and when combined with an on-device personalized training, it can yield an even better recognition result. However, the traditional re-scoring approaches based on an external language model is prone to diverge during the personalized training. In this work, we introduce a model-based… ▽ More

    Submitted 6 October, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: 5 pages, 3 figures, 3 tables

  29. arXiv:2110.00165  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

    Authors: Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He

    Abstract: Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance. However, the approach mostly focus on in-domain performance for public datasets. In this study, we utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problem in a large-scale production setting for online A… ▽ More

    Submitted 15 February, 2022; v1 submitted 30 September, 2021; originally announced October 2021.

    Comments: ICASSP 2022 accepted, 5 pages, 2 figures, 5 tables

  30. arXiv:2110.00155  [pdf, other

    cs.SD cs.LG eess.AS

    Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device

    Authors: Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays

    Abstract: Streaming end-to-end speech recognition models have been widely applied to mobile devices and show significant improvement in efficiency. These models are typically trained on the server using transcribed speech data. However, the server data distribution can be very different from the data distribution on user devices, which could affect the model performance. There are two main challenges for on… ▽ More

    Submitted 30 September, 2021; originally announced October 2021.

    Comments: 5 pages

  31. arXiv:2109.13226  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yan** Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang , et al. (1 additional authors not shown)

    Abstract: We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled da… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography updated

  32. arXiv:2108.04313  [pdf, other

    eess.SP cs.IT

    BEAMWAVE: Cross-Layer Beamforming and Scheduling for Superimposed Transmissions in Industrial IoT mmWave Networks

    Authors: Luis F. Abanto-Leon, Matthias Hollick, Gek Hong Sim

    Abstract: The omnipresence of IoT devices in Industry 4.0 is expected to foster higher reliability, safety, and efficiency. However, interconnecting a large number of wireless devices without jeopardizing the system performance proves challenging. To address the requirements of future industries, we investigate the cross-layer design of beamforming and scheduling for layered-division multiplexing (LDM) syst… ▽ More

    Submitted 9 August, 2021; originally announced August 2021.

    Comments: 8 pages. Accepted at WiOpt 2021

    Journal ref: WiOpt 2021

  33. arXiv:2106.10259  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech

    Authors: Katrin Tomanek, Françoise Beaufays, Julie Cattiau, Angad Chandorkar, Khe Chai Sim

    Abstract: While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns. Personalization of ASR models, a commonly applied solution to this problem, is usually performed in a server-based training environment posing problems around data privacy, de… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

  34. An Open-Source Low-Cost Mobile Robot System with an RGB-D Camera and Efficient Real-Time Navigation Algorithm

    Authors: Taekyung Kim, Seunghyun Lim, Gwanjun Shin, Geonhee Sim, Dongwon Yun

    Abstract: Currently, mobile robots are develo** rapidly and are finding numerous applications in the industry. However, several problems remain related to their practical use, such as the need for expensive hardware and high power consumption levels. In this study, we build a low-cost indoor mobile robot platform that does not include a LiDAR or a GPU. Then, we design an autonomous navigation architecture… ▽ More

    Submitted 13 December, 2022; v1 submitted 4 March, 2021; originally announced March 2021.

    Comments: Accepted to IEEE Access 2022. Project Github: https://github.com/shinkansan/2019-UGRP-DPoom Video: https://youtu.be/Li3-RlO28lk

    Journal ref: IEEE Access, vol. 10, pp. 127871-127881, 2022

  35. arXiv:2008.12967  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Unpaired Deep Learning for Accelerated MRI using Optimal Transport Driven CycleGAN

    Authors: Gyutaek Oh, Byeongsu Sim, Hyung** Chung, Leonard Sunwoo, Jong Chul Ye

    Abstract: Recently, deep learning approaches for accelerated MRI have been extensively studied thanks to their high performance reconstruction in spite of significantly reduced runtime complexity. These neural networks are usually trained in a supervised manner, so matched pairs of subsampled and fully sampled k-space data are required. Unfortunately, it is often difficult to acquire matched fully sampled k… ▽ More

    Submitted 29 August, 2020; originally announced August 2020.

    Comments: Accepted for IEEE Transactions on Computational Imaging

  36. arXiv:2008.07600  [pdf, other

    eess.SP cs.IT

    SWAN: Swarm-Based Low-Complexity Scheme for PAPR Reduction

    Authors: Luis F. Abanto-Leon, Gek Hong Sim, Matthias Hollick, Amnart Boonkajay, Fumiyuki Adachi

    Abstract: Cyclically shifted partial transmit sequences (CS-PTS) has conventionally been used in SISO systems for PAPR reduction of OFDM signals. Compared to other techniques, CS-PTS attains superior performance. Nevertheless, due to the exhaustive search requirement, it demands excessive computational complexity. In this paper, we adapt CS-PTS to operate in a MIMO framework, where singular value decomposit… ▽ More

    Submitted 15 September, 2020; v1 submitted 17 August, 2020; originally announced August 2020.

    Comments: IEEE GLOBECOM 2020

  37. arXiv:2007.09102  [pdf, other

    eess.SY cs.AI

    Breaking Moravec's Paradox: Visual-Based Distribution in Smart Fashion Retail

    Authors: Shin Woong Sung, Hyunsuk Baek, Hyeonjun Sim, Eun Hie Kim, Hyunwoo Hwangbo, Young Jae Jang

    Abstract: In this paper, we report an industry-academia collaborative study on the distribution method of fashion products using an artificial intelligence (AI) technique combined with an optimization method. To meet the current fashion trend of short product lifetimes and an increasing variety of styles, the company produces limited volumes of a large variety of styles. However, due to the limited volume o… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

    Comments: 10 pages, 19 figures, The fifth international workshop on fashion and KDD, KDD 2020

  38. arXiv:2003.08818  [pdf

    cs.CV cs.LG eess.IV

    Brain MRI-based 3D Convolutional Neural Networks for Classification of Schizophrenia and Controls

    Authors: Mengjiao Hu, Kang Sim, Juan Helen Zhou, Xudong Jiang, Cuntai Guan

    Abstract: Convolutional Neural Network (CNN) has been successfully applied on classification of both natural images and medical images but not yet been applied to differentiating patients with schizophrenia from healthy controls. Given the subtle, mixed, and sparsely distributed brain atrophy patterns of schizophrenia, the capability of automatic feature learning makes CNN a powerful tool for classifying sc… ▽ More

    Submitted 14 March, 2020; originally announced March 2020.

    Comments: 4 PAGES

  39. arXiv:2002.00699  [pdf, other

    eess.SP cs.IT

    HydraWave: Multi-Group Multicast Hybrid Precoding and Low-Latency Scheduling for Ubiquitous Industry 4.0 mmWave Communication

    Authors: Luis F. Abanto-Leon, Matthias Hollick, Gek Hong Sim

    Abstract: Industry 4.0 anticipates massive interconnectivity of industrial devices (e.g., sensors, actuators) to support factory automation and production. Due to the rigidity of wired connections to harmonize with automation, wireless information transfer has attracted substantial attention. However, existing solutions for the manufacturing sector face critical issues in co** with the key performance dem… ▽ More

    Submitted 2 September, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

    Comments: IEEE WoWMoM 2020, 10 pages

  40. arXiv:2002.00698  [pdf, ps, other

    eess.SP cs.IT

    Fairness-Aware Hybrid Precoding for mmWave NOMA Unicast/Multicast Transmissions in Industrial IoT

    Authors: Luis F. Abanto-Leon, Gek Hong, Sim

    Abstract: This paper investigates dual-layer non-orthogonally superimposed transmissions for industrial internet of things (IoT) millimeter-wave communications. Essentially, the overlayer is a ubiquitous multicast signal devised to serve all the devices in coverage with a common message, i.e., critical control packet. The underlayer is a composite signal that consists of private unicast messages. Due to saf… ▽ More

    Submitted 27 February, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

    Comments: 7 pages, to be appear in IEEE ICC 2020 Proceedings

  41. arXiv:2002.00670  [pdf, ps, other

    eess.SP cs.IT

    Learning-based Max-Min Fair Hybrid Precoding for mmWave Multicasting

    Authors: Luis F. Abanto-Leon, Gek Hong, Sim

    Abstract: This paper investigates the joint design of hybrid transmit precoder and analog receive combiners for single-group multicasting in millimeter-wave systems. We propose LB-GDM, a low-complexity learning-based approach that leverages gradient descent with momentum and alternating optimization to design (i) the digital and analog constituents of a hybrid transmitter and (ii) the analog combiners of ea… ▽ More

    Submitted 27 February, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

    Comments: 7 pages. To be appear in IEEE ICC 2020 Proceedings

  42. arXiv:2001.08885  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Low-rank Gradient Approximation For Memory-Efficient On-device Training of Deep Neural Network

    Authors: Mary Gooneratne, Khe Chai Sim, Petr Zadrazil, Andreas Kabel, Françoise Beaufays, Giovanni Motta

    Abstract: Training machine learning models on mobile devices has the potential of improving both privacy and accuracy of the models. However, one of the major obstacles to achieving this goal is the memory limitation of mobile devices. Reducing training memory enables models with high-dimensional weight matrices, like automatic speech recognition (ASR) models, to be trained on-device. In this paper, we prop… ▽ More

    Submitted 24 January, 2020; originally announced January 2020.

  43. arXiv:1912.09251  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities

    Authors: Khe Chai Sim, Françoise Beaufays, Arnaud Benard, Dhruv Guliani, Andreas Kabel, Nikhil Khare, Tamar Lucassen, Petr Zadrazil, Harry Zhang, Leif Johnson, Giovanni Motta, Lillian Zhou

    Abstract: We study the effectiveness of several techniques to personalize end-to-end speech models and improve the recognition of proper names relevant to the user. These techniques differ in the amounts of user effort required to provide supervision, and are evaluated on how they impact speech recognition performance. We propose using keyword-dependent precision and recall metrics to measure vocabulary acq… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

  44. arXiv:1910.10416  [pdf, other

    cs.NI eess.SP

    6G Massive Radio Access Networks: Key Issues, Technologies, and Future Challenges

    Authors: Ying Loong Lee, Donghong Qin, Li-Chun Wang, Gek Hong, Sim

    Abstract: Driven by the emerging use cases in massive access future networks, there is a need for technological advancements and evolutions for wireless communications beyond the fifth-generation (5G) networks. In particular, we envisage the upcoming sixth-generation (6G) networks to consist of numerous devices demanding extremely high-performance interconnections even under strenuous scenarios such as dive… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  45. arXiv:1909.12116  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Optimal Transport driven CycleGAN for Unsupervised Learning in Inverse Problems

    Authors: Byeongsu Sim, Gyutaek Oh, Jeongsol Kim, Chanyong Jung, Jong Chul Ye

    Abstract: To improve the performance of classical generative adversarial network (GAN), Wasserstein generative adversarial networks (W-GAN) was developed as a Kantorovich dual formulation of the optimal transport (OT) problem using Wasserstein-1 distance. However, it was not clear how cycleGAN-type generative models can be derived from the optimal transport theory. Here we show that a novel cycleGAN archite… ▽ More

    Submitted 30 August, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

    Comments: accepted for publication in the SIAM Journal on Imaging Sciences

  46. arXiv:1909.06678  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models

    Authors: Khe Chai Sim, Petr Zadrazil, Françoise Beaufays

    Abstract: Speaker-independent speech recognition systems trained with data from many users are generally robust against speaker variability and work well for a large population of speakers. However, these systems do not always generalize well for users with very different speech characteristics. This issue can be addressed by building personalized systems that are designed to work well for each specific use… ▽ More

    Submitted 14 September, 2019; originally announced September 2019.

  47. arXiv:1908.02678  [pdf, ps, other

    eess.SP cs.IT

    Hybrid Precoding for Multi-Group Multicasting in mmWave Systems

    Authors: Luis F. Abanto-Leon, Matthias Hollick, Gek Hong, Sim

    Abstract: Multicast beamforming is known to improve spectral efficiency. However, its benefits and challenges for hybrid precoders design in millimeter-wave (mmWave) systems remain understudied. To this end, this paper investigates the first joint design of hybrid transmit precoders (with an arbitrary number of finite-resolution phase shifts) and receive combiners for mmWave multi-group multicasting. Our pr… ▽ More

    Submitted 3 February, 2020; v1 submitted 7 August, 2019; originally announced August 2019.

    Comments: IEEE GLOBECOM 2019, pp. 1-7

  48. arXiv:1905.09616  [pdf, other

    eess.SP cs.IT

    A Comparative Study of Analog/Digital Self-Interference Cancellation for Full Duplex Radios

    Authors: Jong Woo Kwak, Min Soo Sim, In-Woong Kang, Jong Sung Park, Jaedon Park, Chan-Byoung Chae

    Abstract: Self-interference (SI) is the main obstacle to full-duplex radios. To overcome the SI, researchers have proposed several analog and digital domain self-interference cancellation (SIC) techniques. How well the digital cancellation works depends on the results of analog cancellation. Therefore, to analyze overall SIC performance, one should do so in an integrated manner. In this paper, we build a si… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

  49. arXiv:1810.04121  [pdf

    eess.SP q-bio.QM

    Inter-Patient ECG Classification with Convolutional and Recurrent Neural Networks

    Authors: Li Guo, Gavin Sim, Bogdan Matuszewski

    Abstract: The recent advances in ECG sensor devices provide opportunities for user self-managed auto-diagnosis and monitoring services over the internet. This imposes the requirements for generic ECG classification methods that are inter-patient and device independent. In this paper, we present our work on using the densely connected convolutional neural network (DenseNet) and gated recurrent unit network (… ▽ More

    Submitted 27 September, 2018; originally announced October 2018.

    Comments: 10 pages, 8 figures

  50. arXiv:1808.05312  [pdf, other

    cs.CL eess.AS

    Toward domain-invariant speech recognition via large scale training

    Authors: Arun Narayanan, Ananya Misra, Khe Chai Sim, Golan Pundak, Anshuman Tripathi, Mohamed Elfeky, Parisa Haghani, Trevor Strohman, Michiel Bacchiani

    Abstract: Current state-of-the-art automatic speech recognition systems are trained to work in specific `domains', defined based on factors like application, sampling rate and codec. When such recognizers are used in conditions that do not match the training domain, performance significantly drops. This work explores the idea of building a single domain-invariant model for varied use-cases by combining larg… ▽ More

    Submitted 15 August, 2018; originally announced August 2018.