Skip to main content

Showing 1–50 of 56 results for author: Choi, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.09282  [pdf, other

    cs.CL cs.SD eess.AS

    On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

    Authors: **chuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, Shinji Watanabe

    Abstract: The Open Whisper-style Speech Model (OWSM) series was introduced to achieve full transparency in building advanced speech-to-text (S2T) foundation models. To this end, OWSM models are trained on 25 public speech datasets, which are heterogeneous in multiple ways. In this study, we advance the OWSM series by introducing OWSM v3.2, which improves on prior models by investigating and addressing the i… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2406.08619  [pdf, other

    cs.CL cs.LG eess.AS

    Self-Supervised Speech Representations are More Phonetic than Semantic

    Authors: Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, Shinji Watanabe

    Abstract: Self-supervised speech models (S3Ms) have become an effective backbone for speech applications. Various analyses suggest that S3Ms encode linguistic properties. In this work, we seek a more fine-grained analysis of the word-level linguistic properties encoded in S3Ms. Specifically, we curate a novel dataset of near homophone (phonetically similar) and synonym (semantically similar) word pairs and… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024. Source code at https://github.com/juice500ml/phonetic_semantic_probing

  3. arXiv:2404.18705  [pdf, other

    cs.IT eess.SP

    Wireless Information and Energy Transfer in the Era of 6G Communications

    Authors: Constantinos Psomas, Konstantinos Ntougias, Nikita Shanin, Dongfang Xu, Kenneth MacSporran Mayer, Nguyen Minh Tran, Laura Cottatellucci, Kae Won Choi, Dong In Kim, Robert Schober, Ioannis Krikidis

    Abstract: Wireless information and energy transfer (WIET) represents an emerging paradigm which employs controllable transmission of radio-frequency signals for the dual purpose of data communication and wireless charging. As such, WIET is widely regarded as an enabler of envisioned 6G use cases that rely on energy-sustainable Internet-of-Things (IoT) networks, such as smart cities and smart grids. Meeting… ▽ More

    Submitted 16 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: Proceedings of the IEEE, 36 pages, 33 figures

  4. A Comparative Analysis of Poetry Reading Audio: Singing, Narrating, or Somewhere In Between?

    Authors: Kahyun Choi, Minje Kim

    Abstract: This paper provides a computational analysis of poetry reading audio signals at a large scale to unveil the musicality within professionally-read poems. Although the acoustic characteristics of other types of spoken language have been extensively studied, most of the literature is limited to narrative speech or singing voice, discussing how different they are from each other. In this work, we deve… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 1296-1300

  5. arXiv:2403.17508  [pdf, other

    cs.SD eess.AS

    Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependant

    Authors: Modan Tailleur, Junwon Lee, Mathieu Lagrange, Keunwoo Choi, Laurie M. Heller, Keisuke Imoto, Yuki Okamoto

    Abstract: This paper explores whether considering alternative domain-specific embeddings to calculate the Fréchet Audio Distance (FAD) metric can help the FAD to correlate better with perceptual ratings of environmental sounds. We used embeddings from VGGish, PANNs, MS-CLAP, L-CLAP, and MERT, which are tailored for either music or environmental sound evaluation. The FAD scores were calculated for sounds fro… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  6. arXiv:2403.03499  [pdf, other

    eess.SY

    CNN-based End-to-End Adaptive Controller with Stability Guarantees

    Authors: Myeongseok Ryu, Kyunghwan Choi

    Abstract: This letter proposes a convolutional neural network (CNN)-based adaptive controller wtih three notable features: 1) it determines control input directly from historical sensor data (in an end-to-end process); 2) it learns the desired control policy during real-time implementation without using a pretrained network (in an online adaptive manner); and 3) the asymptotic tracking error convergence is… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 6 pages, 3 figures, Submitted to IEEE L-CSS with CDC Option

  7. arXiv:2401.16658  [pdf, ps, other

    cs.CL eess.AS

    OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

    Authors: Yifan Peng, **chuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe

    Abstract: Recent studies have highlighted the importance of fully open foundation models. The Open Whisper-style Speech Model (OWSM) is an initial step towards reproducing OpenAI Whisper using public data and open-source toolkits. However, previous versions of OWSM (v1 to v3) are still based on standard Transformer, which might lead to inferior performance compared to state-of-the-art speech encoder archite… ▽ More

    Submitted 16 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted at INTERSPEECH 2024. Webpage: https://www.wavlab.org/activities/2024/owsm/

  8. arXiv:2312.10019  [pdf, other

    cs.IT cs.LG eess.AS

    Understanding Probe Behaviors through Variational Bounds of Mutual Information

    Authors: Kwanghee Choi, Jee-weon Jung, Shinji Watanabe

    Abstract: With the success of self-supervised representations, researchers seek a better understanding of the information encapsulated within a representation. Among various interpretability methods, we focus on classification-based linear probing. We aim to foster a solid understanding and provide guidelines for linear probing by constructing a novel mathematical framework leveraging information theory. Fi… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP 2024, implementation available at https://github.com/juice500ml/information_probing

  9. arXiv:2309.15800  [pdf, other

    cs.CL cs.SD eess.AS

    Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

    Authors: Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, **chuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang

    Abstract: Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech features such as spectrograms are often used as the input for the subsequent model. However, they can still be redundant. Recent investigations proposed the use of discrete speech units derived from self-supervised learning repre… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE ICASSP 2024

  10. arXiv:2309.14967  [pdf, other

    cs.CV eess.IV

    A novel approach for holographic 3D content generation without depth map

    Authors: Hakdong Kim, Minkyu Jee, Yurim Lee, Kyudam Choi, MinSung Yoon, Cheongwon Kim

    Abstract: In preparation for observing holographic 3D content, acquiring a set of RGB color and depth map images per scene is necessary to generate computer-generated holograms (CGHs) when using the fast Fourier transform (FFT) algorithm. However, in real-world situations, these paired formats of RGB color and depth map images are not always fully available. We propose a deep learning-based method to synthe… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  11. arXiv:2309.12047  [pdf, other

    cs.CV cs.GR eess.IV

    Self-Calibrating, Fully Differentiable NLOS Inverse Rendering

    Authors: Kiseok Choi, Inchul Kim, Dongyoung Choi, Julio Marco, Diego Gutierrez, Min H. Kim

    Abstract: Existing time-resolved non-line-of-sight (NLOS) imaging methods reconstruct hidden scenes by inverting the optical paths of indirect illumination measured at visible relay surfaces. These methods are prone to reconstruction artifacts due to inversion ambiguities and capture noise, which are typically mitigated through the manual selection of filtering functions and parameters. We introduce a fully… ▽ More

    Submitted 25 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Journal ref: Proceedings of ACM SIGGRAPH Asia 2023 (December 2023)

  12. arXiv:2308.16389  [pdf, other

    cs.SD cs.CY eess.AS

    The Biased Journey of MSD_AUDIO.ZIP

    Authors: Haven Kim, Keunwoo Choi, Mateusz Modrzejewski, Cynthia C. S. Liem

    Abstract: The equitable distribution of academic data is crucial for ensuring equal research opportunities, and ultimately further progress. Yet, due to the complexity of using the API for audio data that corresponds to the Million Song Dataset along with its misreporting (before 2016) and the discontinuation of this API (after 2016), access to this data has become restricted to those within certain affilia… ▽ More

    Submitted 1 December, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: Late-breaking/Demo ISMIR 2023

  13. arXiv:2307.16372  [pdf, other

    cs.SD cs.IR cs.MM eess.AS

    LP-MusicCaps: LLM-Based Pseudo Music Captioning

    Authors: SeungHeon Doh, Keunwoo Choi, Jongpil Lee, Juhan Nam

    Abstract: Automatic music captioning, which generates natural language descriptions for given music tracks, holds significant potential for enhancing the understanding and organization of large volumes of musical data. Despite its importance, researchers face challenges due to the costly and time-consuming collection process of existing music-language datasets, which are limited in size. To address this dat… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

    Comments: Accepted for publication at the 24th International Society for Music Information Retrieval Conference (ISMIR 2023)

  14. arXiv:2307.04377  [pdf, other

    cs.SD eess.AS

    HCLAS-X: Hierarchical and Cascaded Lyrics Alignment System Using Multimodal Cross-Correlation

    Authors: Minsung Kang, Soochul Park, Keunwoo Choi

    Abstract: In this work, we address the challenge of lyrics alignment, which involves aligning the lyrics and vocal components of songs. This problem requires the alignment of two distinct modalities, namely text and audio. To overcome this challenge, we propose a model that is trained in a supervised manner, utilizing the cross-correlation matrix of latent representations between vocals and lyrics. Our syst… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  15. arXiv:2307.04292  [pdf, other

    eess.AS cs.AI

    A Demand-Driven Perspective on Generative Audio AI

    Authors: Sangshin Oh, Minsung Kang, Hyeongi Moon, Keunwoo Choi, Ben Sangbae Chon

    Abstract: To achieve successful deployment of AI research, it is crucial to understand the demands of the industry. In this paper, we present the results of a survey conducted with professional audio engineers, in order to determine research priorities and define various research tasks. We also summarize the current challenges in audio quality and controllability based on the survey. Our analysis emphasizes… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: 10 pages, 7 figures

  16. arXiv:2305.18392  [pdf, other

    cs.SD cs.LG eess.AS

    Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification

    Authors: Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung

    Abstract: This paper proposes an improved Goodness of Pronunciation (GoP) that utilizes Uncertainty Quantification (UQ) for automatic speech intelligibility assessment for dysarthric speech. Current GoP methods rely heavily on neural network-driven overconfident predictions, which is unsuitable for assessing dysarthric speech due to its significant acoustic differences from healthy speech. To alleviate the… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023

  17. arXiv:2305.15898  [pdf, other

    cs.SD eess.AS

    Room Impulse Response Estimation in a Multiple Source Environment

    Authors: Kyungyun Lee, Jeonghun Seo, Keunwoo Choi, Sangmoon Lee, Ben Sangbae Chon

    Abstract: In real-world acoustic scenarios, there often are multiple sound sources present in a room. These sources are situated in various locations and produce sounds that reach the listener from multiple directions. The presence of multiple sources in a room creates new challenges in estimating the room impulse response (RIR) as each source has a unique RIR, dependent on its location and orientation. The… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 2023 AES International Conference on Spatial and Immersive Audio

  18. arXiv:2304.12521  [pdf, other

    cs.SD eess.AS

    Foley Sound Synthesis at the DCASE 2023 Challenge

    Authors: Keunwoo Choi, Jaekwon Im, Laurie Heller, Brian McFee, Keisuke Imoto, Yuki Okamoto, Mathieu Lagrange, Shinosuke Takamichi

    Abstract: The addition of Foley sound effects during post-production is a common technique used to enhance the perceived acoustic properties of multimedia content. Traditionally, Foley sound has been produced by human Foley artists, which involves manual recording and mixing of sound. However, recent advances in sound synthesis and generative models have generated interest in machine-assisted or automatic F… ▽ More

    Submitted 28 September, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: DCASE 2023 Challenge - Task 7 - Technical Report (Submitted to DCASE 2023 Workshop)

  19. arXiv:2304.03940  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    Unsupervised Speech Representation Pooling Using Vector Quantization

    Authors: Jeongkyun Park, Kwanghee Choi, Hyunjun Heo, Hyung-Min Park

    Abstract: With the advent of general-purpose speech representations from large-scale self-supervised models, applying a single model to multiple downstream tasks is becoming a de-facto approach. However, the pooling problem remains; the length of speech representations is inherently variable. The naive average pooling is often used, even though it ignores the characteristics of speech, such as differently l… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

  20. arXiv:2303.10539  [pdf, other

    cs.SD cs.IR cs.MM eess.AS

    Textless Speech-to-Music Retrieval Using Emotion Similarity

    Authors: SeungHeon Doh, Minz Won, Keunwoo Choi, Juhan Nam

    Abstract: We introduce a framework that recommends music based on the emotions of speech. In content creation and daily life, speech contains information about human emotions, which can be enhanced by music. Our framework focuses on a cross-domain retrieval system to bridge the gap between speech and music via emotion labels. We explore different speech representations and report their impact on different s… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: To Appear IEEE ICASSP 2023

  21. arXiv:2303.05715  [pdf, other

    eess.IV cs.CV

    Context-Based Trit-Plane Coding for Progressive Image Compression

    Authors: Seungmin Jeon, Kwang Pyo Choi, Youngo Park, Chang-Su Kim

    Abstract: Trit-plane coding enables deep progressive image compression, but it cannot use autoregressive context models. In this paper, we propose the context-based trit-plane coding (CTC) algorithm to achieve progressive compression more compactly. First, we develop the context-based rate reduction module to estimate trit probabilities of latent elements accurately and thus encode the trit-planes compactly… ▽ More

    Submitted 13 March, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  22. arXiv:2302.00286  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training

    Authors: Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Ju-Chiang Wang, Yun-Ning Hung, Dorien Herremans

    Abstract: In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip. Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilize… ▽ More

    Submitted 1 February, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: text overlap with arXiv:2206.10805

  23. arXiv:2211.14558  [pdf, other

    cs.IR cs.MM cs.SD eess.AS

    Toward Universal Text-to-Music Retrieval

    Authors: SeungHeon Doh, Minz Won, Keunwoo Choi, Juhan Nam

    Abstract: This paper introduces effective design choices for text-to-music retrieval systems. An ideal text-based retrieval system would support various input queries such as pre-defined tags, unseen tags, and sentence-level descriptions. In reality, most previous works mainly focused on a single query type (tag or sentence) which may not generalize to another input type. Hence, we review recent text-based… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

  24. arXiv:2211.07302  [pdf, other

    cs.SD cs.LG eess.AS

    MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation

    Authors: Chang-Bin Jeon, Hyeongi Moon, Keunwoo Choi, Ben Sangbae Chon, Kyogu Lee

    Abstract: Separation of multiple singing voices into each voice is a rarely studied area in music source separation research. The absence of a benchmark dataset has hindered its progress. In this paper, we present an evaluation dataset and provide baseline studies for multiple singing voices separation. First, we introduce MedleyVox, an evaluation dataset for multiple singing voices separation. We specify t… ▽ More

    Submitted 4 May, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: 5 pages, 3 figures, 6 tables, To appear in ICASSP 2023 (camera-ready version)

  25. arXiv:2210.15387  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Automatic Severity Classification of Dysarthric speech by using Self-supervised Model with Multi-task Learning

    Authors: Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung

    Abstract: Automatic assessment of dysarthric speech is essential for sustained treatments and rehabilitation. However, obtaining atypical speech is challenging, often leading to data scarcity issues. To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. Wav2vec 2.0 XLS-R is jointly traine… ▽ More

    Submitted 28 April, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted to ICASSP 2023

  26. arXiv:2210.15386  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Opening the Black Box of wav2vec Feature Encoder

    Authors: Kwanghee Choi, Eun Jung Yeo

    Abstract: Self-supervised models, namely, wav2vec and its variants, have shown promising results in various downstream tasks in the speech domain. However, their inner workings are poorly understood, calling for in-depth analyses on what the model learns. In this paper, we concentrate on the convolutional feature encoder where its latent space is often speculated to represent discrete acoustic units. To ana… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  27. arXiv:2209.12942  [pdf

    cs.CL cs.SD eess.AS

    Cross-lingual Dysarthria Severity Classification for English, Korean, and Tamil

    Authors: Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung

    Abstract: This paper proposes a cross-lingual classification method for English, Korean, and Tamil, which employs both language-independent features and language-unique features. First, we extract thirty-nine features from diverse speech dimensions such as voice quality, pronunciation, and prosody. Second, feature selections are applied to identify the optimal feature set for each language. A set of shared… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 9 pages, 4 figures, APSIPA 2022

  28. arXiv:2209.06419  [pdf, other

    cs.IT eess.SP

    Frequency Reversal Alamouti Code-Based FBMC with Resilience to Inter-Antenna Frequency Offsets

    Authors: Cheng-Yu Lin, Borching Su, Kwonhue Choi

    Abstract: Transmit diversity schemes for filter bank multicarrier (FBMC) are known to be challenging. No existing schemes have considered the presence of inter-antenna frequency offset (IAFO), which will result in performance degradation. In this letter, a new transmit scheme based on the frequency reversal Alamouti code (FRAC)-based structure to address the issue of IAFO is proposed and is proven to inhere… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

  29. Foundations of Wireless Information and Power Transfer: Theory, Prototypes, and Experiments

    Authors: Bruno Clerckx, Junghoon Kim, Kae Won Choi, Dong In Kim

    Abstract: As wireless has disrupted communications, wireless will also disrupt the delivery of energy. Future wireless networks will be equipped with (radiative) wireless power transfer (WPT) capability and exploit radio waves to carry both energy and information through a unified wireless information and power transfer (WIPT). Such networks will make the best use of the RF spectrum and radiation as well as… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Journal ref: in Proceedings of the IEEE, vol. 110, no. 1, pp. 8-30, Jan. 2022, doi: 10.1109/JPROC.2021.3132369

  30. arXiv:2207.10760  [pdf, ps, other

    cs.SD cs.AI cs.MM eess.AS

    A Proposal for Foley Sound Synthesis Challenge

    Authors: Keunwoo Choi, Sangshin Oh, Minsung Kang, Brian McFee

    Abstract: "Foley" refers to sound effects that are added to multimedia during post-production to enhance its perceived acoustic properties, e.g., by simulating the sounds of footsteps, ambient environmental sounds, or visible objects on the screen. While foley is traditionally produced by foley artists, there is increasing interest in automatic or machine-assisted techniques building upon recent advances in… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

  31. arXiv:2206.12638  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Distilling a Pretrained Language Model to a Multilingual ASR Model

    Authors: Kwanghee Choi, Hyung-Min Park

    Abstract: Multilingual speech data often suffer from long-tailed language distribution, resulting in performance degradation. However, multilingual text data is much easier to obtain, yielding a more useful general language model. Hence, we are motivated to distill the rich knowledge embedded inside a well-trained teacher text model to the student speech model. We propose a novel method called the Distillin… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

    Comments: Accepted to Interspeech 2022. Official implementation provided in https://github.com/juice500ml/xlm_to_xlsr

  32. arXiv:2206.10805  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Jointist: Joint Learning for Multi-instrument Transcription and Its Applications

    Authors: Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Amy Hung, Ju-Chiang Wang, Dorien Herremans

    Abstract: In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip. Jointist consists of the instrument recognition module that conditions the other modules: the transcription module that outputs instrument-specific piano rolls, and the source separation module that utiliz… ▽ More

    Submitted 28 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: Submitted to ISMIR

  33. arXiv:2205.00853  [pdf

    eess.IV cs.LG

    Lightweight Image Enhancement Network for Mobile Devices Using Self-Feature Extraction and Dense Modulation

    Authors: Sangwook Baek, Yongsup Park, Youngo Park, Jungmin Lee, Kwangpyo Choi

    Abstract: Convolutional neural network (CNN) based image enhancement methods such as super-resolution and detail enhancement have achieved remarkable performances. However, amounts of operations including convolution and parameters within the networks cost high computing power and need huge memory resource, which limits the applications with on-device requirements. Lightweight image enhancement network shou… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: 8 pages, 9 figures

  34. arXiv:2204.03778  [pdf, other

    eess.SP q-bio.NC

    Mitigating Mismatch Compression in Differential Local Field Potentials

    Authors: Vineet Tiruvadi, Sam James, Bryan Howell, Mosadoluwa Obatusin, Andrea Crowell, Patricio Riva-Posse, Ki Sueng Choi, Allison Waters, Robert E. Gross, Cameron C. McIntyre, Helen S. Mayberg, Robert Butera

    Abstract: Bidirectional deep brain stimulation (bdDBS) devices capable of recording differential local field potentials (dLFP) enable neural recordings alongside clinical therapy. Efforts to identify objective signals of various brain disorders, or disease readouts, are challenging in dLFP, especially during active DBS. In this report we identified, characterized, and mitigated a major source of distortion… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: 9 pages, 9 figures

  35. arXiv:2112.06334  [pdf, other

    eess.IV cs.CV

    DPICT: Deep Progressive Image Compression Using Trit-Planes

    Authors: Jae-Han Lee, Seungmin Jeon, Kwang Pyo Choi, Youngo Park, Chang-Su Kim

    Abstract: We propose the deep progressive image compression using trit-planes (DPICT) algorithm, which is the first learning-based codec supporting fine granular scalability (FGS). First, we transform an image into a latent tensor using an analysis network. Then, we represent the latent tensor in ternary digits (trits) and encode it into a compressed bitstream trit-plane by trit-plane in the decreasing orde… ▽ More

    Submitted 6 May, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

    Comments: Accepted to CVPR 2022 (Oral presentation)

    MSC Class: 94A08 (Primary) 68T07; 68P30; 68U10 (Secondary) ACM Class: I.4.2; I.4.9

  36. arXiv:2111.13457  [pdf, other

    cs.SD eess.AS

    Semi-Supervised Music Tagging Transformer

    Authors: Minz Won, Keunwoo Choi, Xavier Serra

    Abstract: We present Music Tagging Transformer that is trained with a semi-supervised approach. The proposed model captures local acoustic characteristics in shallow convolutional layers, then temporally summarizes the sequence of the extracted features using stacked self-attention layers. Through a careful model assessment, we first show that the proposed architecture outperforms the previous state-of-the-… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

    Comments: International Society for Music Information Retrieval (ISMIR) 2021

  37. arXiv:2111.11636  [pdf

    cs.SD cs.IR eess.AS

    Music Classification: Beyond Supervised Learning, Towards Real-world Applications

    Authors: Minz Won, Janne Spijkervet, Keunwoo Choi

    Abstract: Music classification is a music information retrieval (MIR) task to classify music items to labels such as genre, mood, and instruments. It is also closely related to other concepts such as music similarity and musical preference. In this tutorial, we put our focus on two directions - the recent training schemes beyond supervised learning and the successful application of music classification mode… ▽ More

    Submitted 2 December, 2021; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: This is a web book written for a tutorial session of the 22nd International Society for Music Information Retrieval Conference, Nov 8-12, 2021. Please visit https://music-classification.github.io/tutorial/ for the original, web book format

  38. arXiv:2111.08457  [pdf

    eess.SP cs.LG

    A Novel TSK Fuzzy System Incorporating Multi-view Collaborative Transfer Learning for Personalized Epileptic EEG Detection

    Authors: Andong Li, Zhaohong Deng, Qiongdan Lou, Kup-Sze Choi, Hongbin Shen, Shitong Wang

    Abstract: In clinical practice, electroencephalography (EEG) plays an important role in the diagnosis of epilepsy. EEG-based computer-aided diagnosis of epilepsy can greatly improve the ac-curacy of epilepsy detection while reducing the workload of physicians. However, there are many challenges in practical applications for personalized epileptic EEG detection (i.e., training of detection model for a specif… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

    Comments: Submitted to IEEE Trans

  39. arXiv:2110.14131  [pdf, other

    cs.SD cs.LG eess.AS

    Temporal Knowledge Distillation for On-device Audio Classification

    Authors: Kwanghee Choi, Martin Kersner, Jacob Morton, Buru Chang

    Abstract: Improving the performance of on-device audio classification models remains a challenge given the computational limits of the mobile environment. Many studies leverage knowledge distillation to boost predictive performance by transferring the knowledge from large models to on-device models. However, most lack a mechanism to distill the essence of the temporal information, which is crucial to audio… ▽ More

    Submitted 5 February, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: ICASSP 2022

  40. arXiv:2110.09127  [pdf, other

    cs.SD cs.LG eess.AS

    SpecTNT: a Time-Frequency Transformer for Music Audio

    Authors: Wei-Tsung Lu, Ju-Chiang Wang, Minz Won, Keunwoo Choi, Xuchen Song

    Abstract: Transformers have drawn attention in the MIR field for their remarkable performance shown in natural language processing and computer vision. However, prior works in the audio processing domain mostly use Transformer as a temporal feature aggregator that acts similar to RNNs. In this paper, we propose SpecTNT, a Transformer-based architecture to model both spectral and temporal sequences of an inp… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: 6 pages

    Journal ref: International Society for Music Information Retrieval (ISMIR) 2021

  41. arXiv:2110.02509  [pdf, other

    eess.SY eess.SP

    Design and Implementation of 5.8GHz RF Wireless PowerTransfer System

    Authors: Je Hyeon Park, Nguyen Minh Tran, Sa Il Hwang, Dong In Kim, Kae Won Choi

    Abstract: In this paper, we present a 5.8 GHz radio-frequency (RF) wireless power transfer (WPT) system that consists of 64 transmit antennas and 16 receive antennas. Unlike the inductive or resonant coupling-based near-field WPT, RF WPT has a great advantage in powering low-power internet of things (IoT) devices with its capability of long-range wireless power transfer. We also propose a beam scanning algo… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  42. arXiv:2109.05418  [pdf, other

    cs.SD eess.AS

    Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation

    Authors: Qiuqiang Kong, Yin Cao, Haohe Liu, Keunwoo Choi, Yuxuan Wang

    Abstract: Deep neural network based methods have been successfully applied to music source separation. They typically learn a map** from a mixture spectrogram to a set of source spectrograms, all with magnitudes only. This approach has several limitations: 1) its incorrect phase reconstruction degrades the performance, 2) it limits the magnitude of masks between 0 and 1 while we observe that 22% of time-f… ▽ More

    Submitted 11 September, 2021; originally announced September 2021.

    Comments: 6 pages

    Journal ref: International Society for Music Information Retrieval (ISMIR) 2021

  43. Reconfigurable Intelligent Surface-Aided Wireless Power Transfer Systems: Analysis and Implementation

    Authors: Nguyen Minh Tran, Muhammad Miftahul Amri, Je Hyeon Park, Dong In Kim, Kae Won Choi

    Abstract: Reconfigurable intelligent surface (RIS) is a promising technology for RF wireless power transfer (WPT) as it is capable of beamforming and beam focusing without using active and power-hungry components. In this paper, we propose a multi-tile RIS beam scanning (MTBS) algorithm for powering up internet-of-things (IoT) devices. Considering the hardware limitations of the IoT devices, the proposed al… ▽ More

    Submitted 13 March, 2022; v1 submitted 12 June, 2021; originally announced June 2021.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  44. arXiv:2103.05158  [pdf

    cs.CV cs.AI eess.IV

    Deep Learning-based High-precision Depth Map Estimation from Missing Viewpoints for 360 Degree Digital Holography

    Authors: Hakdong Kim, Heonyeong Lim, Minkyu Jee, Yurim Lee, Jisoo Jeong, Kyudam Choi, MinSung Yoon, Cheongwon Kim

    Abstract: In this paper, we propose a novel, convolutional neural network model to extract highly precise depth maps from missing viewpoints, especially well applicable to generate holographic 3D contents. The depth map is an essential element for phase extraction which is required for synthesis of computer-generated hologram (CGH). The proposed model called the HDD Net uses MSE for the better performance o… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: 12 pages, 10 figures, 5 tables

  45. arXiv:2103.01893  [pdf, other

    cs.SD cs.CL eess.AS

    Listen, Read, and Identify: Multimodal Singing Language Identification of Music

    Authors: Keunwoo Choi, Yuxuan Wang

    Abstract: We propose a multimodal singing language classification model that uses both audio content and textual metadata. LRID-Net, the proposed model, takes an audio signal and a language probability vector estimated from the metadata and outputs the probabilities of the target languages. Optionally, LRID-Net is facilitated with modality dropouts to handle a missing modality. In the experiment, we trained… ▽ More

    Submitted 27 July, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: ISMIR 2021 camera-ready

  46. arXiv:2010.14805  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Large-Scale MIDI-based Composer Classification

    Authors: Qiuqiang Kong, Keunwoo Choi, Yuxuan Wang

    Abstract: Music classification is a task to classify a music piece into labels such as genres or composers. We propose large-scale MIDI based composer classification systems using GiantMIDI-Piano, a transcription-based dataset. We propose to use piano rolls, onset rolls, and velocity rolls as input representations and use deep neural networks as classifiers. To our knowledge, we are the first to investigate… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

  47. arXiv:2010.00823  [pdf, other

    cs.SD cs.LG eess.AS

    Deep Composer Classification Using Symbolic Representation

    Authors: Sunghyeon Kim, Hyeyoon Lee, Sunjong Park, **ho Lee, Keunwoo Choi

    Abstract: In this study, we train deep neural networks to classify composer on a symbolic domain. The model takes a two-channel two-dimensional input, i.e., onset and note activations of time-pitch representation, which is converted from MIDI recordings and performs a single-label classification. On the experiments conducted on MAESTRO dataset, we report an F1 value of 0.8333 for the classification of 13~cl… ▽ More

    Submitted 26 October, 2020; v1 submitted 2 October, 2020; originally announced October 2020.

  48. arXiv:2007.12581  [pdf, other

    eess.AS cs.LG cs.SD

    Dereverberation using joint estimation of dry speech signal and acoustic system

    Authors: Sanna Wager, Keunwoo Choi, Simon Durand

    Abstract: The purpose of speech dereverberation is to remove quality-degrading effects of a time-invariant impulse response filter from the signal. In this report, we describe an approach to speech dereverberation that involves joint estimation of the dry speech signal and of the room impulse response. We explore deep learning models that apply to each task separately, and how these can be combined in a joi… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

  49. arXiv:2003.13255  [pdf, other

    eess.SP cs.IT

    Joint Orthogonal Band and Power Allocation for Energy Fairness in WPT System with Nonlinear Logarithmic Energy Harvesting Model

    Authors: Jaeseob Han, Gyeong Ho Lee, Sangdon Park, Jun Kyun Choi

    Abstract: Wireless power transmission (WPT) is expected to play an important role in the Internet of Things services by providing the perpetual operation of IoT sensors. However, to prolong the IoT network's lifetime, the efficient resource allocation algorithm is required, in particular, the energy fairness issue among IoT sensors has been a critical challenge of the WPT system. In this paper, considering… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: 12 pages, 27 figures

  50. arXiv:1912.05537  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Encoding Musical Style with Transformer Autoencoders

    Authors: Kristy Choi, Curtis Hawthorne, Ian Simon, Monica Dinculescu, Jesse Engel

    Abstract: We consider the problem of learning high-level controls over the global structure of generated sequences, particularly in the context of symbolic music generation with complex language models. In this work, we present the Transformer autoencoder, which aggregates encodings of the input data across time to obtain a global representation of style from a given performance. We show it is possible to c… ▽ More

    Submitted 30 June, 2020; v1 submitted 10 December, 2019; originally announced December 2019.