Skip to main content

Showing 1–50 of 89 results for author: Choi, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.08328  [pdf, other

    eess.AS

    Multimodal Representation Loss Between Timed Text and Audio for Regularized Speech Separation

    Authors: Tsun-An Hsieh, Heeyoul Choi, Minje Kim

    Abstract: Recent studies highlight the potential of textual modalities in conditioning the speech separation model's inference process. However, regularization-based methods remain underexplored despite their advantages of not requiring auxiliary text data during the test time. To address this gap, we introduce a timed text-based regularization (TTR) method that uses language model-derived semantics to impr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2404.06452  [pdf, other

    cs.RO eess.SY

    PAAM: A Framework for Coordinated and Priority-Driven Accelerator Management in ROS 2

    Authors: Daniel Enright, Yecheng Xiang, Hyunjong Choi, Hyoseung Kim

    Abstract: This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor th… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 14 Pages, 14 Figures

  3. arXiv:2403.19763  [pdf, other

    cs.SD cs.HC cs.MM eess.AS

    Creating Aesthetic Sonifications on the Web with SIREN

    Authors: Tristan Peng, Hongchan Choi, Jonathan Berger

    Abstract: SIREN is a flexible, extensible, and customizable web-based general-purpose interface for auditory data display (sonification). Designed as a digital audio workstation for sonification, synthesizers written in JavaScript using the Web Audio API facilitate intuitive map** of data to auditory parameters for a wide range of purposes. This paper explores the breadth of sound synthesis techniques s… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 7 pages, 1 figure, 5 listings, submitted to the Web Audio Conference 2024

  4. arXiv:2403.12498  [pdf, ps, other

    cs.IT eess.SP

    WMMSE-Based Rate Maximization for RIS-Assisted MU-MIMO Systems

    Authors: Hyuck** Choi, A. Lee Swindlehurst, Junil Choi

    Abstract: Reconfigurable intelligent surface (RIS) technology, given its ability to favorably modify wireless communication environments, will play a pivotal role in the evolution of future communication systems. This paper proposes rate maximization techniques for both single-user and multiuser MIMO systems, based on the well-known weighted minimum mean square error (WMMSE) criterion. Using a suitable weig… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  5. arXiv:2403.09270  [pdf, ps, other

    cs.IT eess.SP

    A Deep Reinforcement Learning Approach for Autonomous Reconfigurable Intelligent Surfaces

    Authors: Hyuck** Choi, Ly V. Nguyen, Junil Choi, A. Lee Swindlehurst

    Abstract: A reconfigurable intelligent surface (RIS) is a prospective wireless technology that enhances wireless channel quality. An RIS is often equipped with passive array of elements and provides cost and power-efficient solutions for coverage extension of wireless communication systems. Without any radio frequency (RF) chains or computing resources, however, the RIS requires control information to be se… ▽ More

    Submitted 19 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  6. arXiv:2402.18930  [pdf, other

    eess.IV cs.CV

    Variable-Rate Learned Image Compression with Multi-Objective Optimization and Quantization-Reconstruction Offsets

    Authors: Fatih Kamisli, Fabien Racape, Hyomin Choi

    Abstract: Achieving successful variable bitrate compression with computationally simple algorithms from a single end-to-end learned image or video compression model remains a challenge. Many approaches have been proposed, including conditional auto-encoders, channel-adaptive gains for the latent tensor or uniformly quantizing all elements of the latent tensor. This paper follows the traditional approach to… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted as a paper at DCC 2024

  7. arXiv:2401.14421  [pdf, other

    cs.LG cs.MA eess.SY stat.ML

    Multi-Agent Based Transfer Learning for Data-Driven Air Traffic Applications

    Authors: Chuhao Deng, Hong-Cheol Choi, Hyunsang Park, Inseok Hwang

    Abstract: Research in develo** data-driven models for Air Traffic Management (ATM) has gained a tremendous interest in recent years. However, data-driven models are known to have long training time and require large datasets to achieve good performance. To address the two issues, this paper proposes a Multi-Agent Bidirectional Encoder Representations from Transformers (MA-BERT) model that fully considers… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 12 pages, 8 figures, submitted for IEEE Transactions on Intelligent Transportation System

  8. arXiv:2312.05548  [pdf, other

    eess.IV cs.CV cs.LG

    A Unified Multi-Phase CT Synthesis and Classification Framework for Kidney Cancer Diagnosis with Incomplete Data

    Authors: Kwang-Hyun Uhm, Seung-Won Jung, Moon Hyung Choi, Sung-Hoo Hong, Sung-Jea Ko

    Abstract: Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effectiv… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics

    Journal ref: JBHI, 2022

  9. arXiv:2312.05334  [pdf, other

    eess.IV cs.CV

    ProsDectNet: Bridging the Gap in Prostate Cancer Detection via Transrectal B-mode Ultrasound Imaging

    Authors: Sulaiman Vesal, Indrani Bhattacharya, Hassan Jahanandish, Xinran Li, Zachary Kornberg, Steve Ran Zhou, Elijah Richard Sommer, Moon Hyung Choi, Richard E. Fan, Geoffrey A. Sonn, Mirabela Rusu

    Abstract: Interpreting traditional B-mode ultrasound images can be challenging due to image artifacts (e.g., shadowing, speckle), leading to low sensitivity and limited diagnostic accuracy. While Magnetic Resonance Imaging (MRI) has been proposed as a solution, it is expensive and not widely available. Furthermore, most biopsies are guided by Transrectal Ultrasound (TRUS) alone and can miss up to 52% cancer… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Accepted in NeurIPS 2023 (Medical Imaging meets NeurIPS Workshop)

  10. arXiv:2311.12454  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

    Authors: Sang-Hoon Lee, Ha-Yeong Choi, Seung-Bin Kim, Seong-Whan Lee

    Abstract: Large language models (LLM)-based speech synthesis has been widely adopted in zero-shot speech synthesis. However, they require a large-scale data and possess the same limitations as previous autoregressive speech models, including slow inference speed and lack of robustness. This paper proposes HierSpeech++, a fast and strong zero-shot speech synthesizer for text-to-speech (TTS) and voice convers… ▽ More

    Submitted 27 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: 16 pages, 9 figures, 12 tables

  11. arXiv:2311.04693  [pdf, other

    eess.AS cs.AI cs.SD eess.SP

    Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation

    Authors: Ha-Yeong Choi, Sang-Hoon Lee, Seong-Whan Lee

    Abstract: Although voice conversion (VC) systems have shown a remarkable ability to transfer voice style, existing methods still have an inaccurate pitch and low speaker adaptation quality. To address these challenges, we introduce Diff-HierVC, a hierarchical VC system based on two diffusion models. We first introduce DiffPitch, which can effectively generate F0 with the target voice style. Subsequently, th… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: INTERSPEECH 2023 (Oral)

  12. arXiv:2311.02581  [pdf, other

    cs.SD eess.AS

    Yet Another Generative Model For Room Impulse Response Estimation

    Authors: Sungho Lee, Hyeong-Seok Choi, Kyogu Lee

    Abstract: Recent neural room impulse response (RIR) estimators typically comprise an encoder for reference audio analysis and a generator for RIR synthesis. Especially, it is the performance of the generator that directly influences the overall estimation quality. In this context, we explore an alternate generator architecture for improved performance. We first train an autoencoder with residual quantizatio… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: WASPAA 2023

  13. arXiv:2310.14506  [pdf, other

    eess.SP cs.DB

    Label Space Partition Selection for Multi-Object Tracking Using Two-Layer Partitioning

    Authors: Ji Youn Lee, Changbeom Shim, Hoa Van Nguyen, Tran Thien Dat Nguyen, Hyun** Choi, Youngho Kim

    Abstract: Estimating the trajectories of multi-objects poses a significant challenge due to data association ambiguity, which leads to a substantial increase in computational requirements. To address such problems, a divide-and-conquer manner has been employed with parallel computation. In this strategy, distinguished objects that have unique labels are grouped based on their statistical dependencies, the i… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: 6 pages, 4 figures

  14. arXiv:2310.10633  [pdf, other

    physics.optics eess.IV

    Telescope imaging beyond the Rayleigh limit in extremely low SNR

    Authors: Hyunsoo Choi, Seungman Choi, Peter Menart, Angshuman Deka, Zubin Jacob

    Abstract: The Rayleigh limit and low Signal-to-Noise Ratio (SNR) scenarios pose significant limitations to optical imaging systems used in remote sensing, infrared thermal imaging, and space domain awareness. In this study, we introduce a Stochastic Sub-Rayleigh Imaging (SSRI) algorithm to localize point objects and estimate their positions, brightnesses, and number in low SNR conditions, even below the Ray… ▽ More

    Submitted 17 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 18 pages, 5 figures

  15. arXiv:2310.06546  [pdf, other

    cs.SD cs.CL eess.AS

    AutoCycle-VC: Towards Bottleneck-Independent Zero-Shot Cross-Lingual Voice Conversion

    Authors: Haeyun Choi, Jio Gim, Yuho Lee, Youngin Kim, Young-Joo Suh

    Abstract: This paper proposes a simple and robust zero-shot voice conversion system with a cycle structure and mel-spectrogram pre-processing. Previous works suffer from information loss and poor synthesis quality due to their reliance on a carefully designed bottleneck structure. Moreover, models relying solely on self-reconstruction loss struggled with reproducing different speakers' voices. To address th… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  16. arXiv:2309.01262  [pdf, other

    cs.CV cs.HC cs.LG eess.SP

    Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition

    Authors: Hyeongju Choi, Apoorva Beedu, Irfan Essa

    Abstract: Human Activity Recognition (HAR) systems have been extensively studied by the vision and ubiquitous computing communities due to their practical applications in daily life, such as smart homes, surveillance, and health monitoring. Typically, this process is supervised in nature and the development of such systems requires access to large quantities of annotated data. However, the higher costs… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  17. arXiv:2308.02133  [pdf, other

    eess.SP

    NeuralEQ: Neural-Network-Based Equalizer for High-Speed Wireline Communication

    Authors: Hanseok Kim, Jae Hyung Ju, Hyun Seok Choi, Hyeri Roh, Woo-Seok Choi

    Abstract: With the growing demand for high-bandwidth applications like video streaming and cloud services, the data transfer rates required for wireline communication keeps increasing, making the channel loss a major obstacle in achieving low bit error rate (BER). Equalization techniques such as feed-forward equalizer (FFE) and decision feedback equalizer (DFE) are commonly used to compensate for channel lo… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  18. arXiv:2307.16171  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer

    Authors: Sang-Hoon Lee, Ha-Yeong Choi, Hyung-Seok Oh, Seong-Whan Lee

    Abstract: Despite rapid progress in the voice style transfer (VST) field, recent zero-shot VST systems still lack the ability to transfer the voice style of a novel speaker. In this paper, we present HierVST, a hierarchical adaptive end-to-end zero-shot VST model. Without any text transcripts, we only use the speech dataset to train the model by utilizing hierarchical variational inference and self-supervis… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

    Comments: INTERSPEECH 2023 (Oral)

  19. arXiv:2305.15816  [pdf, other

    eess.AS cs.AI cs.SD eess.SP

    DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion

    Authors: Ha-Yeong Choi, Sang-Hoon Lee, Seong-Whan Lee

    Abstract: Diffusion-based generative models have exhibited powerful generative performance in recent years. However, as many attributes exist in the data distribution and owing to several limitations of sharing the model parameters across all levels of the generation process, it remains challenging to control specific styles for each attribute. To address the above problem, this paper presents decoupled den… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 23 pages, 10 figures, 17 tables, under review

  20. arXiv:2305.13970  [pdf, other

    eess.SY

    Darwin: A DRAM-based Multi-level Processing-in-Memory Architecture for Data Analytics

    Authors: Donghyuk Kim, Jae-Young Kim, Wontak Han, Jongsoon Won, Haerang Choi, Yongkee Kwon, Joo-Young Kim

    Abstract: Processing-in-memory (PIM) architecture is an inherent match for data analytics application, but we observe major challenges to address when accelerating it using PIM. In this paper, we propose Darwin, a practical LRDIMM-based multi-level PIM architecture for data analytics, which fully exploits the internal bandwidth of DRAM using the bank-, bank group-, chip-, and rank-level parallelisms. Consid… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 14 pages, 16 figures

  21. arXiv:2304.11385  [pdf, ps, other

    cs.IT eess.SP

    WiThRay: A Versatile Ray-Tracing Simulator for Smart Wireless Environments

    Authors: Hyuck** Choi, Jaehoon Chung, Jaeky Oh, George C. Alexandropoulos, Junil Choi

    Abstract: This paper presents the development and evaluation of WiThRay, a new wireless three-dimensional ray-tracing (RT) simulator. RT-based simulators are widely used for generating realistic channel data by combining RT methodology to get signal trajectories and electromagnetic (EM) equations, resulting in generalized channel impulse responses (CIRs). This paper first provides a comprehensive comparison… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

    Comments: 23 pages, 25 figures, submitted to IEEE Access

  22. Cross-domain Denoising for Low-dose Multi-frame Spiral Computed Tomography

    Authors: Yucheng Lu, Zhixin Xu, Moon Hyung Choi, Jimin Kim, Seung-Won Jung

    Abstract: Computed tomography (CT) has been used worldwide as a non-invasive test to assist in diagnosis. However, the ionizing nature of X-ray exposure raises concerns about potential health risks such as cancer. The desire for lower radiation doses has driven researchers to improve reconstruction quality. Although previous studies on low-dose computed tomography (LDCT) denoising have demonstrated the effe… ▽ More

    Submitted 28 June, 2024; v1 submitted 21 April, 2023; originally announced April 2023.

    Journal ref: IEEE Transactions on Medical Imaging (2024)

  23. arXiv:2303.05686  [pdf, other

    eess.IV cs.CV

    Generative AI for Rapid Diffusion MRI with Improved Image Quality, Reliability and Generalizability

    Authors: Amir Sadikov, Xinlei Pan, Hannah Choi, Lanya T. Cai, Pratik Mukherjee

    Abstract: Diffusion MRI is a non-invasive, in-vivo biomedical imaging method for map** tissue microstructure. Applications include structural connectivity imaging of the human brain and detecting microstructural neural changes. However, acquiring high signal-to-noise ratio dMRI datasets with high angular and spatial resolution requires prohibitively long scan times, limiting usage in many important clinic… ▽ More

    Submitted 6 October, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  24. arXiv:2302.01738  [pdf, other

    eess.IV cs.LG

    AIROGS: Artificial Intelligence for RObust Glaucoma Screening Challenge

    Authors: Coen de Vente, Koenraad A. Vermeer, Nicolas Jaccard, He Wang, Hongyi Sun, Firas Khader, Daniel Truhn, Temirgali Aimyshev, Yerkebulan Zhanibekuly, Tien-Dung Le, Adrian Galdran, Miguel Ángel González Ballester, Gustavo Carneiro, Devika R G, Hrishikesh P S, Densen Puthussery, Hong Liu, Zekang Yang, Satoshi Kondo, Satoshi Kasai, Edward Wang, Ashritha Durvasula, Jónathan Heras, Miguel Ángel Zapata, Teresa Araújo , et al. (11 additional authors not shown)

    Abstract: The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios… ▽ More

    Submitted 10 February, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: 19 pages, 8 figures, 3 tables

  25. arXiv:2301.04183  [pdf, other

    eess.IV

    Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

    Authors: Ezgi Ozyilkan, Mateen Ulhaq, Hyomin Choi, Fabien Racape

    Abstract: As an increasing amount of image and video content will be analyzed by machines, there is demand for a new codec paradigm that is capable of compressing visual input primarily for the purpose of computer vision inference, while secondarily supporting input reconstruction. In this work, we propose a learned compression architecture that can be used to build such a codec. We introduce a novel variat… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

    Comments: accepted as a paper for DCC 2023

  26. arXiv:2301.01290  [pdf, other

    eess.IV

    Frequency-aware Learned Image Compression for Quality Scalability

    Authors: Hyomin Choi, Fabien Racape, Shahab Hamidi-Rad, Mateen Ulhaq, Simon Feltman

    Abstract: Spatial frequency analysis and transforms serve a central role in most engineered image and video lossy codecs, but are rarely employed in neural network (NN)-based approaches. We propose a novel NN-based image coding framework that utilizes forward wavelet transforms to decompose the input signal by spatial frequency. Our encoder generates separate bitstreams for each latent representation of low… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: Presented at VCIP'22

  27. arXiv:2212.06387  [pdf, other

    cs.SD eess.AS

    Towards trustworthy phoneme boundary detection with autoregressive model and improved evaluation metric

    Authors: Hyeongju Kim, Hyeong-Seok Choi

    Abstract: Phoneme boundary detection has been studied due to its central role in various speech applications. In this work, we point out that this task needs to be addressed not only by algorithmic way, but also by evaluation metric. To this end, we first propose a state-of-the-art phoneme boundary detector that operates in an autoregressive manner, dubbed SuperSeg. Experiments on the TIMIT and Buckeye corp… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

    Comments: 5 pages, submitted to ICASSP 2023

  28. arXiv:2211.09407  [pdf, other

    cs.SD eess.AS

    NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis

    Authors: Hyeong-Seok Choi, **hyeok Yang, Juheon Lee, Hyeongju Kim

    Abstract: Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common. In addition, most of the voice synthesis models still require a large number of audio data paired with annotated labels (e.g., text transcription and music score) for training. To this end, we propose a unified framework of synthesizing and manipulating voice s… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Submitted to ICLR 2023

  29. arXiv:2211.03078  [pdf, other

    eess.AS cs.SD

    An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space

    Authors: Jihwan Lee, Jae-Sung Bae, Seongkyu Mun, Hee** Choi, Joun Yeop Lee, Hoon-Young Cho, Chanwoo Kim

    Abstract: With the recent developments in cross-lingual Text-to-Speech (TTS) systems, L2 (second-language, or foreign) accent problems arise. Moreover, running a subjective evaluation for such cross-lingual TTS systems is troublesome. The vowel space analysis, which is often utilized to explore various aspects of language including L2 accents, is a great alternative analysis tool. In this study, we apply th… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  30. arXiv:2210.05524  [pdf, other

    cs.RO eess.SY

    A Learning-Based Estimation and Control Framework for Contact-Intensive Tight-Tolerance Tasks

    Authors: Bukun Son, Hyelim Choi, Jaemin Yoon, Dongjun Lee

    Abstract: We present a two-stage framework that integrates a learning-based estimator and a controller, designed to address contact-intensive tasks. The estimator leverages a Bayesian particle filter with a mixture density network (MDN) structure, effectively handling multi-modal issues arising from contact information. The controller combines a self-supervised and reinforcement learning (RL) approach, stra… ▽ More

    Submitted 1 August, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

  31. arXiv:2209.07780  [pdf, ps, other

    eess.SY cs.RO

    Computing Forward Reachable Sets for Nonlinear Adaptive Multirotor Controllers

    Authors: Juyeop Han, Han-Lim Choi

    Abstract: In multirotor systems, guaranteeing safety while considering unknown disturbances is essential for robust trajectory planning. The Forward reachable set (FRS), the set of feasible states subject to bounded disturbances, can be utilized to identify robust and collision-free trajectories by checking the intersections with obstacles. However, in many cases, the FRS is not calculated in real time and… ▽ More

    Submitted 6 March, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

    Comments: 8 pages, 3 figures, Accepted to ACC 2023

    MSC Class: J.2

  32. arXiv:2208.06132  [pdf, ps, other

    cs.IT eess.SP

    On the Physical Layer Security of Visible Light Communications Empowered by Gold Nanoparticles

    Authors: Geonho Han, Hyuck** Choi, Ryeong Myeong Kim, Ki Tae Nam, Junil Choi, Theodoros A. Tsiftsis

    Abstract: Visible light is a proper spectrum for secure wireless communications because of its high directivity and impermeability in indoor scenarios. However, if an eavesdropper is located very close to a legitimate receiver, secure communications become highly risky. In this paper, to further increase the level of security of visible light communication (VLC) and increase its resilience against to malici… ▽ More

    Submitted 7 June, 2024; v1 submitted 12 August, 2022; originally announced August 2022.

  33. arXiv:2208.02512  [pdf, other

    eess.IV cs.CV

    Scalable Video Coding for Humans and Machines

    Authors: Hyomin Choi, Ivan V. Bajić

    Abstract: Video content is watched not only by humans, but increasingly also by machines. For example, machine learning models analyze surveillance video for security and traffic monitoring, search through YouTube videos for inappropriate content, and so on. In this paper, we propose a scalable video coding framework that supports machine vision (specifically, object detection) through its base layer bitstr… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

    Comments: 6 pages, 5 figures, IEEE MMSP 2022

  34. arXiv:2205.01874  [pdf, other

    eess.IV cs.CV

    Joint Image Compression and Denoising via Latent-Space Scalability

    Authors: Saeed Ranjbar Alvar, Mateen Ulhaq, Hyomin Choi, Ivan V. Bajić

    Abstract: When it comes to image compression in digital cameras, denoising is traditionally performed prior to compression. However, there are applications where image noise may be necessary to demonstrate the trustworthiness of the image, such as court evidence and image forensics. This means that noise itself needs to be coded, in addition to the clean image itself. In this paper, we present a learning-ba… ▽ More

    Submitted 4 September, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

  35. arXiv:2204.03249  [pdf, other

    cs.SD eess.AS

    Expressive Singing Synthesis Using Local Style Token and Dual-path Pitch Encoder

    Authors: Juheon Lee, Hyeong-Seok Choi, Kyogu Lee

    Abstract: This paper proposes a controllable singing voice synthesis system capable of generating expressive singing voice with two novel methodologies. First, a local style token module, which predicts frame-level style tokens from an input pitch and text sequence, is proposed to allow the singing voice system to control musical expression often unspecified in sheet music (e.g., breathing and intensity). S… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: 4 pages, Submitted to Interspeech 2022

  36. arXiv:2204.01271  [pdf, other

    eess.AS cs.LG cs.SD

    Into-TTS : Intonation Template Based Prosody Control System

    Authors: Jihwan Lee, Joun Yeop Lee, Hee** Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim

    Abstract: Intonations play an important role in delivering the intention of a speaker. However, current end-to-end TTS systems often fail to model proper intonations. To alleviate this problem, we propose a novel, intuitive method to synthesize speech in different intonations using predefined intonation templates. Prior to TTS model training, speech data are grouped into intonation templates in an unsupervi… ▽ More

    Submitted 6 November, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Submitted to ICASSP 2023

  37. arXiv:2203.00906  [pdf, other

    eess.SY

    Distributed goal assignment strategy for improving leader-following formation control performance

    Authors: Yun Ho Choi, Doik Kim

    Abstract: This paper investigates a distributed goal assignment problem in leader-following formation control of second-order multi-agent systems. It is assumed that each agent can communicate with nearby agents within the communication range and the leader information is only available to a subset of agents. Compared with existing formation control schemes addressing the goal assignment issue, the main con… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

    Comments: 26 pages, 11 figures, journal

  38. arXiv:2202.13598  [pdf, other

    eess.SY

    Red Light, Green Light Game of Multi-Robot Systems with Safety Barrier Certificates

    Authors: Yun Ho Choi, Doik Kim

    Abstract: In this paper, we propose the safety barrier certificates for uncertain multi-robot systems playing red light, green light game. According to the rule of the game, the robots are allowed to move forward after a doll shouts `green light' and must stop when it shouts `red light'. Following this rule, a two-mode nominal controller is designed where one mode is for moving forward and the other one is… ▽ More

    Submitted 28 February, 2022; originally announced February 2022.

    Comments: 6 pages, 5 figures, IEEE Robotics and Automation Letters with IROS option

  39. arXiv:2202.01856  [pdf, ps, other

    math.OC eess.SY math.FA

    Data-Driven Optimal Control via Linear Transfer Operators: A Convex Approach

    Authors: Joseph Moyalan, Hyung** Choi, Yongxin Chen, Umesh Vaidya

    Abstract: This paper is concerned with data-driven optimal control of nonlinear systems. We present a convex formulation to the optimal control problem (OCP) with a discounted cost function. We consider OCP with both positive and negative discount factor. The convex approach relies on lifting nonlinear system dynamics in the space of densities using the linear Perron-Frobenius (P-F) operator. This lifting l… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  40. arXiv:2112.14934  [pdf, other

    cs.CV eess.IV

    SFU-HW-Tracks-v1: Object Tracking Dataset on Raw Video Sequences

    Authors: Takehiro Tanaka, Hyomin Choi, Ivan V. Bajić

    Abstract: We present a dataset that contains object annotations with unique object identities (IDs) for the High Efficiency Video Coding (HEVC) v1 Common Test Conditions (CTC) sequences. Ground-truth annotations for 13 sequences were prepared and released as the dataset called SFU-HW-Tracks-v1. For each video frame, ground truth annotations include object class ID, object ID, and bounding box location and i… ▽ More

    Submitted 30 December, 2021; originally announced December 2021.

    Comments: 4 pages, 3 figures, submitted to Data in Brief

  41. arXiv:2111.06401  [pdf

    eess.IV cs.CV

    Stacked U-Nets with Self-Assisted Priors Towards Robust Correction of Rigid Motion Artifact in Brain MRI

    Authors: Mohammed A. Al-masni, Seul Lee, Jaeuk Yi, Sewook Kim, Sung-Min Gho, Young Hun Choi, Dong-Hyun Kim

    Abstract: In this paper, we develop an efficient retrospective deep learning method called stacked U-Nets with self-assisted priors to address the problem of rigid motion artifacts in MRI. The proposed work exploits the usage of additional knowledge priors from the corrupted images themselves without the need for additional contrast data. The proposed network learns missed structural details through sharing… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

    Comments: 24 pages, 10 figures, 3 tables

  42. arXiv:2110.14513  [pdf, other

    cs.SD cs.AI eess.AS

    Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

    Authors: Hyeong-Seok Choi, Juheon Lee, Wansoo Kim, Jie Hwan Lee, Hoon Heo, Kyogu Lee

    Abstract: We present a neural analysis and synthesis (NANSY) framework that can manipulate voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have focused on using information bottleneck to disentangle analysis features for controllable synthesis, which usually results in poor reconstruction quality. We address this issue by proposing a novel training strategy based on informa… ▽ More

    Submitted 28 October, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: Neural Information Processing Systems (NeurIPS) 2021

  43. Scalable Image Coding for Humans and Machines

    Authors: Hyomin Choi, Ivan V. Bajic

    Abstract: At present, and increasingly so in the future, much of the captured visual content will not be seen by humans. Instead, it will be used for automated machine vision analytics and may require occasional human viewing. Examples of such applications include traffic monitoring, visual surveillance, autonomous navigation, and industrial machine vision. To address such requirements, we develop an end-to… ▽ More

    Submitted 13 January, 2022; v1 submitted 18 July, 2021; originally announced July 2021.

    Comments: Submitted for peer review to IEEE Transactions

  44. arXiv:2106.10426  [pdf, other

    cs.IT cs.LG eess.SP

    Algorithm Unrolling for Massive Access via Deep Neural Network with Theoretical Guarantee

    Authors: Yandong Shi, Hayoung Choi, Yuanming Shi, Yong Zhou

    Abstract: Massive access is a critical design challenge of Internet of Things (IoT) networks. In this paper, we consider the grant-free uplink transmission of an IoT network with a multiple-antenna base station (BS) and a large number of single-antenna IoT devices. Taking into account the sporadic nature of IoT devices, we formulate the joint activity detection and channel estimation (JADCE) problem as a gr… ▽ More

    Submitted 19 June, 2021; originally announced June 2021.

    Comments: 15 pages, 15 figures, this paper has been submitted to IEEE Transactions on Wireless Communications

  45. arXiv:2105.13940  [pdf, other

    cs.SD eess.AS

    Differentiable Artificial Reverberation

    Authors: Sungho Lee, Hyeong-Seok Choi, Kyogu Lee

    Abstract: Artificial reverberation (AR) models play a central role in various audio applications. Therefore, estimating the AR model parameters (ARPs) of a reference reverberation is a crucial task. Although a few recent deep-learning-based approaches have shown promising performance, their non-end-to-end training scheme prevents them from fully exploiting the potential of deep neural networks. This motivat… ▽ More

    Submitted 20 July, 2022; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: Accepted to TASLP

  46. arXiv:2105.11681  [pdf, other

    cs.LG cs.SD eess.AS

    Deep Neural Networks and End-to-End Learning for Audio Compression

    Authors: Daniela N. Rim, Inseon Jang, Heeyoul Choi

    Abstract: Recent achievements in end-to-end deep learning have encouraged the exploration of tasks dealing with highly structured data with unified deep network models. Having such models for compressing audio signals has been challenging since it requires discrete representations that are not easy to train with end-to-end backpropagation. In this paper, we present an end-to-end deep learning approach that… ▽ More

    Submitted 13 July, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

  47. arXiv:2105.10089  [pdf, other

    eess.IV

    Latent-space scalability for multi-task collaborative intelligence

    Authors: Hyomin Choi, Ivan V. Bajic

    Abstract: We investigate latent-space scalability for multi-task collaborative intelligence, where one of the tasks is object detection and the other is input reconstruction. In our proposed approach, part of the latent space can be selectively decoded to support object detection while the remainder can be decoded when input reconstruction is needed. Such an approach allows reduced computational resources w… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    Comments: To be presented in IEEE ICIP'21

  48. Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence

    Authors: Robert A. Cohen, Hyomin Choi, Ivan V. Bajić

    Abstract: In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features outpu… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: Accepted for publication in IEEE Open Journal of Circuits and Systems

    Journal ref: IEEE Open Journal of Circuits and Systems, vol. 2, 13 May 2021, pp. 350-362

  49. Lightweight compression of neural network feature tensors for collaborative intelligence

    Authors: Robert A. Cohen, Hyomin Choi, Ivan V. Bajić

    Abstract: In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a relatively low-complexity device such as a mobile phone or edge device, and the remainder of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to code the activations of a split DN… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: Accepted for publication in IEEE ICME 2020

    Journal ref: 2020 IEEE International Conference on Multimedia and Expo (ICME)

  50. arXiv:2104.10431  [pdf, other

    cs.SD eess.AS

    Room adaptive conditioning method for sound event classification in reverberant environments

    Authors: Jaejun Lee, Donmoon Lee, Hyeong-Seok Choi, Kyogu Lee

    Abstract: Ensuring performance robustness for a variety of situations that can occur in real-world environments is one of the challenging tasks in sound event classification. One of the unpredictable and detrimental factors in performance, especially in indoor environments, is reverberation. To alleviate this problem, we propose a conditioning method that provides room impulse response (RIR) information to… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

    Comments: 5 pages, 3 figures, In Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)