Skip to main content

Showing 1–50 of 63 results for author: Kang, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.11807  [pdf, other

    cs.HC cs.RO eess.SY

    Dual-sided Peltier Elements for Rapid Thermal Feedback in Wearables

    Authors: Seongjun Kang, Gwangbin Kim, Seokhyun Hwang, Jeongju Park, Ahmed Elsharkawy, SeungJun Kim

    Abstract: This paper introduces a motor-driven Peltier device designed to deliver immediate thermal sensations within extended reality (XR) environments. The system incorporates eight motor-driven Peltier elements, facilitating swift transitions between warm and cool sensations by rotating preheated or cooled elements to opposite sides. A multi-layer structure, comprising aluminum and silicone layers, ensur… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 3 pages, 4 figures, ICRA Wearable Workshop 2024 - 1st Workshop on Advancing Wearable Devices and Applications through Novel Design, Sensing, Actuation, and AI

  2. arXiv:2403.14126  [pdf, other

    eess.SP

    Sub-Nyquist Sampling OFDM Radar With a Time-Frequency Phase-Coded Waveform

    Authors: Seonghyeon Kang, Kawon Han, Songcheol Hong

    Abstract: This paper presents a time-frequency phase-coded sub-Nyquist sampling orthogonal frequency division multiplexing (PC-SNS-OFDM) radar system to reduce the analog-to-digital converter (ADC) sampling rate without any additional hardware or signal processing. The proposed radar divides the transmitted OFDM signal into multiple sub-bands along the frequency axis and provides orthogonality to these sub-… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  3. arXiv:2402.16153  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, **gcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

    Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

  4. arXiv:2402.03517  [pdf, other

    cs.IT cs.NI eess.SP

    Spatially Consistent Air-to-Ground Channel Modeling via Generative Neural Networks

    Authors: Amedeo Giuliani, Rasoul Nikbakht, Giovanni Geraci, Seongjoon Kang, Angel Lozano, Sundeep Rangan

    Abstract: This article proposes a generative neural network architecture for spatially consistent air-to-ground channel modeling. The approach considers the trajectories of uncrewed aerial vehicles along typical urban paths, capturing spatial dependencies within received signal strength (RSS) sequences from multiple cellular base stations (gNBs). Through the incorporation of conditioning data, the model acc… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: To appear in IEEE Wireless Communications Letters

  5. arXiv:2401.13276  [pdf, other

    eess.AS

    SCNet: Sparse Compression Network for Music Source Separation

    Authors: Weinan Tong, Jiaxu Zhu, Jun Chen, Shiyin Kang, Tao Jiang, Yang Li, Zhiyong Wu, Helen Meng

    Abstract: Deep learning-based methods have made significant achievements in music source separation. However, obtaining good results while maintaining a low model complexity remains challenging in super wide-band music source separation. Previous works either overlook the differences in subbands or inadequately address the problem of information loss when generating subband features. In this paper, we propo… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  6. arXiv:2401.07532  [pdf, other

    cs.SD cs.AI eess.AS

    Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation

    Authors: Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, **g Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng

    Abstract: Variational Autoencoders (VAEs) constitute a crucial component of neural symbolic music generation, among which some works have yielded outstanding results and attracted considerable attention. Nevertheless, previous VAEs still encounter issues with overly long feature sequences and generated results lack contextual coherence, thus the challenge of modeling long multi-track symbolic music still re… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  7. arXiv:2312.06637  [pdf, other

    eess.SP

    A Geometry-based Stochastic Wireless Channel Model using Channel Images

    Authors: Seongjoon Kang

    Abstract: Due to the high complexity of geometry-deterministic wireless channel modeling and the difficulty in its implementation, geometry-based stochastic channel modeling (GBSM) approaches have been used to evaluate wireless systems. This paper introduces a new method to model any GBSM by training a generative neural network using images formed by channel parameters. In this work, we obtain channel param… ▽ More

    Submitted 21 June, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  8. arXiv:2311.12965  [pdf, other

    eess.SY

    Terrestrial-Satellite Spectrum Sharing in the Upper Mid-Band with Interference Nulling

    Authors: Seongjoon Kang, Giovanni Geraci, Marco Mezzavilla, Sundeep Rangan

    Abstract: The growing demand for broader bandwidth in cellular networks has turned the upper mid-band (7-24 GHz) into a focal point for expansion. However, the integration of terrestrial cellular and incumbent satellite services, particularly in the 12 GHz band, poses significant interference challenges. This paper investigates the interference dynamics in terrestrial-satellite coexistence scenarios and int… ▽ More

    Submitted 6 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  9. arXiv:2310.19264  [pdf, other

    cs.MM cs.SD eess.AS

    Sound of Story: Multi-modal Storytelling with Audio

    Authors: Jaeyeon Bae, Seokhoon Jeong, Seokun Kang, Namgi Han, Jae-Yon Lee, Hyounghun Kim, Taehwan Kim

    Abstract: Storytelling is multi-modal in the real world. When one tells a story, one may use all of the visualizations and sounds along with the story itself. However, prior studies on storytelling datasets and tasks have paid little attention to sound even though sound also conveys meaningful semantics of the story. Therefore, we propose to extend story understanding and telling areas by establishing a new… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023, project: https://github.com/Sosdatasets/SoS_Dataset/

  10. arXiv:2310.04010  [pdf, other

    cs.CV cs.AI eess.IV

    Excision And Recovery: Visual Defect Obfuscation Based Self-Supervised Anomaly Detection Strategy

    Authors: YeongHyeon Park, Sungho Kang, Myung ** Kim, Yeonho Lee, Hyeong Seok Kim, Juneho Yi

    Abstract: Due to scarcity of anomaly situations in the early manufacturing stage, an unsupervised anomaly detection (UAD) approach is widely adopted which only uses normal samples for training. This approach is based on the assumption that the trained UAD model will accurately reconstruct normal patterns but struggles with unseen anomalous patterns. To enhance the UAD performance, reconstruction-by-inpainti… ▽ More

    Submitted 9 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: 10 pages, 5 figures, 5 tables

  11. arXiv:2309.13077  [pdf, other

    cs.LG cs.AI eess.IV

    A Differentiable Framework for End-to-End Learning of Hybrid Structured Compression

    Authors: Moonjung Eo, Suhyun Kang, Wonjong Rhee

    Abstract: Filter pruning and low-rank decomposition are two of the foundational techniques for structured compression. Although recent efforts have explored hybrid approaches aiming to integrate the advantages of both techniques, their performance gains have been modest at best. In this study, we develop a \textit{Differentiable Framework~(DF)} that can express filter selection, rank selection, and budget c… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 11 pages, 5 figures, 6 tables

  12. arXiv:2309.11977  [pdf, other

    cs.SD eess.AS

    Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

    Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han, Helen Meng

    Abstract: Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation parameters. By quantizing speech waveform into discrete acoustic tokens and modeling these tokens with the language model, recent language model-based TTS models show zero-shot speaker adaptation capabilities with only a 3-second acoustic prompt of an unseen speaker. However, they are limited by th… ▽ More

    Submitted 9 April, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted bt ICASSP 2024

  13. Cellular Wireless Networks in the Upper Mid-Band

    Authors: Seongjoon Kang, Marco Mezzavilla, Sundeep Rangan, Arjuna Madanayake, Satheesh Bojja Venkatakrishnan, Gregory Hellbourg, Monisha Ghosh, Hamed Rahmani, Aditya Dhananjay

    Abstract: The upper mid-band - roughly from 7 to 24 GHz - has attracted considerable recent interest for new cellular services. This frequency range has vastly more spectrum than the highly congested bands below 7 GHz while offering more favorable propagation and coverage than the millimeter wave (mmWave) frequencies. The upper mid-band can thus provide a powerful and complementary frequency range to balanc… ▽ More

    Submitted 6 March, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: 18 pages

  14. arXiv:2309.01615  [pdf

    eess.SP cs.ET

    A balanced Memristor-CMOS ternary logic family and its application

    Authors: Xiao-Yuan Wang, Jia-Wei Zhou, Chuan-Tao Dong, Xin-Hui Chen, Sanjoy Kumar Nandi, Robert G. Elliman, Sung-Mo Kang, Herbert Ho-Ching Iu

    Abstract: The design of balanced ternary digital logic circuits based on memristors and conventional CMOS devices is proposed. First, balanced ternary minimum gate TMIN, maximum gate TMAX and ternary inverters are systematically designed and verified by simulation, and then logic circuits such as ternary encoders, decoders and multiplexers are designed on this basis. Two different schemes are then used to r… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 15 pages, 30 figures

  15. arXiv:2308.16836  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

    Authors: Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Based on the main architecture of recently proposed VISinger, we put forward several specific designs for expressive singing voice synthesis. First, dif… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

  16. arXiv:2308.16593  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

    Authors: Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style. However, synthesizing spontaneous-style speech is challenging due to the lack of high-quality spontaneous datasets and the high cost of labeling spontaneous behavior. In this paper, we propose a semi-supervised pre-training method to increase the amount of spontaneous-style speech an… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted by INTERSPEECH 2023

  17. Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

    Authors: Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

    Abstract: For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech. Although inter-utterance linguistic information can influence the speech interpretation of the target utterance, previous works on PSP mainly focus on utilizing intrautterance linguistic information of the current utterance only. This work proposes to use in… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted by Interspeech2022

  18. arXiv:2308.14595  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Neural Network Training Strategy to Enhance Anomaly Detection Performance: A Perspective on Reconstruction Loss Amplification

    Authors: YeongHyeon Park, Sungho Kang, Myung ** Kim, Hyeonho Jeong, Hyunkyu Park, Hyeong Seok Kim, Juneho Yi

    Abstract: Unsupervised anomaly detection (UAD) is a widely adopted approach in industry due to rare anomaly occurrences and data imbalance. A desirable characteristic of an UAD model is contained generalization ability which excels in the reconstruction of seen normal patterns but struggles with unseen anomalies. Recent studies have pursued to contain the generalization capability of their UAD models in rec… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 5 pages, 4 figures, 2 tables

  19. Sub-Nyquist Sampling OFDM Radar

    Authors: Kawon Han, SeongHyeon Kang, Songcheol Hong

    Abstract: In this paper, we propose a sub-Nyquist sampling (SNS) orthogonal frequency-division multiplexing (OFDM) radar system capable of reducing the analog-to-digital converter (ADC) sampling rate in OFDM radar without any additional manipulations of its hardware and waveform. To this end, the proposed system utilizes the ADC sampling rate of B/L to sample the received baseband signal with a bandwidth of… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: 12 pages, 13 figures

    Journal ref: IEEE Transactions on Radar Systems, vol. 1, pp. 669-680, 2023

  20. arXiv:2307.16012  [pdf, other

    cs.SD eess.AS

    MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis

    Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Xixin Wu, Shiyin Kang, Helen Meng

    Abstract: Expressive speech synthesis is crucial for many human-computer interaction scenarios, such as audiobooks, podcasts, and voice assistants. Previous works focus on predicting the style embeddings at one single scale from the information within the current sentence. Whereas, context information in neighboring sentences and multi-scale nature of style in human speech are neglected, making it challengi… ▽ More

    Submitted 29 July, 2023; originally announced July 2023.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  21. arXiv:2304.12704  [pdf, other

    cs.SD cs.MM eess.AS

    GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network

    Authors: Haolin Zhuang, Shun Lei, Long Xiao, Weiqin Li, Liyang Chen, Sicheng Yang, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Music-driven 3D dance generation has become an intensive research topic in recent years with great potential for real-world applications. Most existing methods lack the consideration of genre, which results in genre inconsistency in the generated dance movements. In addition, the correlation between the dance genre and the music has not been investigated. To address these issues, we propose a genr… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted by ICASSP2023.Demo page: https://im1eon.github.io/ICASSP23-GTNB-DG/

  22. arXiv:2304.09607  [pdf, other

    cs.SD cs.CL eess.AS

    CB-Conformer: Contextual biasing Conformer for biased word recognition

    Authors: Yaoxun Xu, Baiji Liu, Qiaochu Huang and, Xingchen Song, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Due to the mismatch between the source and target domains, how to better utilize the biased word information to improve the performance of the automatic speech recognition model in the target domain becomes a hot research topic. Previous approaches either decode with a fixed external language model or introduce a sizeable biasing module, which leads to poor adaptability and slow inference. In this… ▽ More

    Submitted 25 April, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

  23. arXiv:2304.06359  [pdf, other

    cs.SD eess.AS

    Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis

    Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Recent advances in text-to-speech have significantly improved the expressiveness of synthesized speech. However, it is still challenging to generate speech with contextually appropriate and coherent speaking style for multi-sentence text in audiobooks. In this paper, we propose a context-aware coherent speaking style prediction method for audiobook speech synthesis. To predict the style embedding… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: Accepted by ICASSP 2023

  24. arXiv:2304.03295  [pdf, other

    cs.SD cs.HC eess.AS

    Automatic Detection of Reactions to Music via Earable Sensing

    Authors: Euihyoek Lee, Chulhong Min, Jeaseung Lee, ** Yu, Seungwoo Kang

    Abstract: We present GrooveMeter, a novel system that automatically detects vocal and motion reactions to music via earable sensing and supports music engagement-aware applications. To this end, we use smart earbuds as sensing devices, which are already widely used for music listening, and devise reaction detection techniques by leveraging an inertial measurement unit (IMU) and a microphone on earbuds. To e… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  25. arXiv:2303.10081  [pdf, other

    math.OC cs.RO eess.SY

    Verification and Synthesis of Robust Control Barrier Functions: Multilevel Polynomial Optimization and Semidefinite Relaxation

    Authors: Shucheng Kang, Yuxiao Chen, Heng Yang, Marco Pavone

    Abstract: We study the problem of verification and synthesis of robust control barrier functions (CBF) for control-affine polynomial systems with bounded additive uncertainty and convex polynomial constraints on the control. We first formulate robust CBF verification and synthesis as multilevel polynomial optimization problems (POP), where verification optimizes -- in three levels -- the uncertainty, contro… ▽ More

    Submitted 21 July, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: Accepted to IEEE Conference on Decision and Control (CDC) 2023

  26. arXiv:2303.07206  [pdf

    cs.CY eess.SY

    Toward A Dynamic Comfort Model for Human-Building Interaction in Grid-Interactive Efficient Buildings: Supported by Field Data

    Authors: SungKu Kang, Kunind Sharma, Maharshi Pathak, Emily Casavant, Katherine Bassett, Misha Pavel, David Fannon, Michael Kane

    Abstract: Controlling building electric loads could alleviate the increasing grid strain caused by the adoption of renewables and electrification. However, current approaches that automatically setback thermostats on the hottest day compromise their efficacy by neglecting human-building interaction (HBI). This study aims to define challenges and opportunities for develo** engineering models of HBI to be u… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: 17 pages, 11 figures

  27. arXiv:2301.06200  [pdf, other

    eess.SP cs.LG

    Efficiently Computing Sparse Fourier Transforms of $q$-ary Functions

    Authors: Yigit Efe Erginbas, Justin Singh Kang, Amirali Aghazadeh, Kannan Ramchandran

    Abstract: Fourier transformations of pseudo-Boolean functions are popular tools for analyzing functions of binary sequences. Real-world functions often have structures that manifest in a sparse Fourier transform, and previous works have shown that under the assumption of sparsity the transform can be computed efficiently. But what if we want to compute the Fourier transform of functions defined over a $q$-a… ▽ More

    Submitted 15 January, 2023; originally announced January 2023.

    Comments: 29 pages, 3 figures

  28. arXiv:2209.08964  [pdf, other

    eess.SY

    Coexistence of UAVs and Terrestrial Users in Millimeter-Wave Urban Networks

    Authors: Seongjoon Kang, Marco Mezzavilla, Angel Lozano, Giovanni Geraci, Sundeep Rangan, Vasilii Semkin, William Xia, Giuseppe Loianno

    Abstract: 5G millimeter-wave (mmWave) cellular networks are in the early phase of commercial deployments and present a unique opportunity for robust, high-data-rate communication to unmanned aerial vehicles (UAVs). A fundamental question is whether and how mmWave networks designed for terrestrial users should be modified to serve UAVs. The paper invokes realistic cell layouts, antenna patterns, and channel… ▽ More

    Submitted 20 September, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

  29. arXiv:2208.13454  [pdf, other

    eess.SY

    Minimum Input Design for Direct Data-driven Property Identification of Unknown Linear Systems

    Authors: Shubo Kang, Keyou You

    Abstract: In a direct data-driven approach, this paper studies the {\em property identification(ID)} problem to analyze whether an unknown linear system has a property of interest, e.g., stabilizability and structural properties. In sharp contrast to the model-based analysis, we approach it by directly using the input and state feedback data of the unknown system. Via a new concept of sufficient richness of… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

  30. arXiv:2207.00934  [pdf, other

    cs.RO cs.LG eess.SP

    Wireless Channel Prediction in Partially Observed Environments

    Authors: Mingsheng Yin, Yaqi Hu, Tommy Azzino, Seongjoon Kang, Marco Mezzavilla, Sundeep Rangan

    Abstract: Site-specific radio frequency (RF) propagation prediction increasingly relies on models built from visual data such as cameras and LIDAR sensors. When operating in dynamic settings, the environment may only be partially observed. This paper introduces a method to extract statistical channel models, given partial observations of the surrounding environment. We propose a simple heuristic algorithm t… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

  31. arXiv:2204.02743  [pdf, other

    cs.SD eess.AS

    Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

    Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Jiankun Hu, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Previous works on expressive speech synthesis focus on modelling the mono-scale style embedding from the current sentence or context, but the multi-scale nature of speaking style in human speech is neglected. In this paper, we propose a multi-scale speaking style modelling method to capture and predict multi-scale speaking style for improving the naturalness and expressiveness of synthetic speech.… ▽ More

    Submitted 5 July, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: Accepted by INTERSPEECH 2022

  32. arXiv:2203.12813  [pdf, other

    cs.SD cs.CL eess.AS

    Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion

    Authors: Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng

    Abstract: Non-parallel data voice conversion (VC) have achieved considerable breakthroughs recently through introducing bottleneck features (BNFs) extracted by the automatic speech recognition(ASR) model. However, selection of BNFs have a significant impact on VC result. For example, when extracting BNFs from ASR trained with Cross Entropy loss (CE-BNFs) and feeding into neural network to train a VC system,… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: Accepted by ICASSP 2022

  33. arXiv:2203.12201  [pdf, other

    cs.SD cs.CL eess.AS

    Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

    Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Previous works on expressive speech synthesis mainly focus on current sentence. The context in adjacent sentences is neglected, resulting in inflexible speaking style for the same text, which lacks speech variations. In this paper, we propose a hierarchical framework to model speaking style from context. A hierarchical context encoder is proposed to explore a wider range of contextual information… ▽ More

    Submitted 6 April, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Accepted by ICASSP 2022

  34. arXiv:2203.12188  [pdf, other

    cs.SD cs.AI eess.AS

    FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement

    Authors: Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Previously proposed FullSubNet has achieved outstanding performance in Deep Noise Suppression (DNS) Challenge and attracted much attention. However, it still encounters issues such as input-output mismatch and coarse processing for frequency bands. In this paper, we propose an extended single-channel real-time speech enhancement framework called FullSubNet+ with following significant improvements.… ▽ More

    Submitted 26 March, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Accepted by ICASSP 2022

  35. arXiv:2203.05125  [pdf, ps, other

    eess.SP math.OC

    A Lifted $\ell_1 $ Framework for Sparse Recovery

    Authors: Yaghoub Rahimi, Sung Ha Kang, Yifei Lou

    Abstract: Motivated by re-weighted $\ell_1$ approaches for sparse recovery, we propose a lifted $\ell_1$ (LL1) regularization which is a generalized form of several popular regularizations in the literature. By exploring such connections, we discover there are two types of lifting functions which can guarantee that the proposed approach is equivalent to the $\ell_0$ minimization. Computationally, we design… ▽ More

    Submitted 12 May, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: 24 pages

    MSC Class: 65K10; 49N45; 65F50; 90C90; 49M20

  36. arXiv:2201.00229  [pdf, other

    eess.SP

    Understanding Energy Efficiency and Interference Tolerance in Millimeter Wave Receivers

    Authors: Panagiotis Skrimponis, Seongjoon Kang, Abbas Khalili, Wonho Lee, Navid Hosseinzadeh, Marco Mezzavilla, Elza Erkip, Mark J. W. Rodwell, James F. Buckwalter, Sundeep Rangan

    Abstract: Power consumption is a key challenge in millimeter wave (mmWave) receiver front-ends, due to the need to support high dimensional antenna arrays at wide bandwidths. Recently, there has been considerable work in develo** low-power front-ends, often based on low-resolution ADCs and low-power mixers. A critical but less studied consequence of such designs is the relatively low-dynamic range which i… ▽ More

    Submitted 1 January, 2022; originally announced January 2022.

    Comments: Appeared at the Asilomar Conference on Signals, Systems, and Computers 2021

  37. arXiv:2110.03396  [pdf, other

    eess.IV cs.CV

    AnoSeg: Anomaly Segmentation Network Using Self-Supervised Learning

    Authors: Jouwon Song, Kyeongbo Kong, Ye-In Park, Seong-Gyun Kim, Suk-Ju Kang

    Abstract: Anomaly segmentation, which localizes defective areas, is an important component in large-scale industrial manufacturing. However, most recent researches have focused on anomaly detection. This paper proposes a novel anomaly segmentation network (AnoSeg) that can directly generate an accurate anomaly map using self-supervised learning. For highly accurate anomaly segmentation, the proposed AnoSeg… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: 10 pages, 17 figures

  38. arXiv:2107.04526  [pdf, ps, other

    cs.NI eess.SY

    A Dual-Connection based Handover Scheme for Ultra-Dense Millimeter-Wave Cellular Networks

    Authors: Seongjoon Kang, Siyoung Choi, Goodsol Lee, Saewoong Bahk

    Abstract: Mobile users in an ultra-dense millimeter-wave cellular network experience handover events more frequently than in conventional networks, which results in increased service interruption time and performance degradation due to blockages. Multi-connectivity has been proposed to resolve this, and it also extends the coverage of millimeter-wave communications. In this paper, we propose a dual-connecti… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

  39. arXiv:2107.03298  [pdf, other

    cs.SD cs.MM eess.AS

    VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis

    Authors: Hui Lu, Zhiyong Wu, Xixin Wu, Xu Li, Shiyin Kang, Xunying Liu, Helen Meng

    Abstract: This paper describes a variational auto-encoder based non-autoregressive text-to-speech (VAENAR-TTS) model. The autoregressive TTS (AR-TTS) models based on the sequence-to-sequence architecture can generate high-quality speech, but their sequential decoding process can be time-consuming. Recently, non-autoregressive TTS (NAR-TTS) models have been shown to be more efficient with the parallel decodi… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

  40. arXiv:2104.04600  [pdf, ps, other

    eess.SP

    Millimeter-Wave UAV Coverage in Urban Environments

    Authors: Seongjoon Kang, Marco Mezzavilla, Angel Lozano, Giovanni Geraci, William Xia, Sundeep Rangan, Vasilii Semkin, Giuseppe Loianno

    Abstract: With growing interest in mmWave connectivity for UAVs, a basic question is whether networks intended for terrestrial users can provide sufficient aerial coverage as well. To assess this possibility, the paper proposes a novel evaluation methodology using generative models trained on detailed ray tracing data. These models capture complex propagation characteristics and can be readily combined with… ▽ More

    Submitted 19 May, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

  41. arXiv:2103.17149  [pdf, other

    eess.SP

    Lightweight UAV-based Measurement System for Air-to-Ground Channels at 28 GHz

    Authors: Vasilii Semkin, Seongjoon Kang, Jaakko Haarla, William Xia, Ismo Huhtinen, Giovanni Geraci, Angel Lozano, Giuseppe Loianno, Marco Mezzavilla, Sundeep Rangan

    Abstract: Wireless communication at millimeter wave frequencies has attracted considerable attention for the delivery of high-bit-rate connectivity to unmanned aerial vehicles (UAVs). However, conducting the channel measurements necessary to assess communication at these frequencies has been challenging due to the severe payload and power restrictions in commercial UAVs. This work presents a novel lightweig… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

  42. arXiv:2102.06536  [pdf, other

    cs.AR eess.IV eess.SP

    CrossStack: A 3-D Reconfigurable RRAM Crossbar Inference Engine

    Authors: Jason K. Eshraghian, Kyoungrok Cho, Sung Mo Kang

    Abstract: Deep neural network inference accelerators are rapidly growing in importance as we turn to massively parallelized processing beyond GPUs and ASICs. The dominant operation in feedforward inference is the multiply-and-accumlate process, where each column in a crossbar generates the current response of a single neuron. As a result, memristor crossbar arrays parallelize inference and image processing… ▽ More

    Submitted 7 February, 2021; originally announced February 2021.

    Comments: 5 pages, 4 figures

  43. arXiv:2102.00184  [pdf, other

    eess.AS cs.LG cs.SD

    Adversarially learning disentangled speech representations for robust multi-factor voice conversion

    Authors: Jie Wang, **gbei Li, Xintao Zhao, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Factorizing speech as disentangled speech representations is vital to achieve highly controllable style transfer in voice conversion (VC). Conventional speech representation learning methods in VC only factorize speech as speaker and content, lacking controllability on other prosody-related factors. State-of-the-art speech representation learning methods for more speechfactors are using primary di… ▽ More

    Submitted 20 August, 2021; v1 submitted 30 January, 2021; originally announced February 2021.

  44. arXiv:2010.06423  [pdf

    q-bio.QM eess.IV

    A comprehensive protocol for manual segmentation of the human claustrum and its sub-regions using high-resolution MRI

    Authors: Seung Suk Kang, Joseph Bodenheimer, Tracey Butler

    Abstract: The claustrum (Cl) is a thin grey matter structure located in the center of each brain hemisphere. Cl has been hypothesized as a central hub of the brain for multisensory/sensorimotor integration, consciousness, and attention. Accumulating evidence has suggested that Cl might be important in the development of severe neurological and psychiatric symptoms including epileptic seizures and psychosis.… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Comments: 15 pages, 6 figures

  45. arXiv:2010.01810  [pdf, other

    cs.CV eess.IV

    Painting Outside as Inside: Edge Guided Image Outpainting via Bidirectional Rearrangement with Progressive Step Learning

    Authors: Kyunghun Kim, Yeohun Yun, Keon-Woo Kang, Kyeongbo Kong, Siyeong Lee, Suk-Ju Kang

    Abstract: Image outpainting is a very intriguing problem as the outside of a given image can be continuously filled by considering as the context of the image. This task has two main challenges. The first is to maintain the spatial consistency in contents of generated regions and the original input. The second is to generate a high-quality large image with a small amount of adjacent information. Conventiona… ▽ More

    Submitted 9 November, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Paper accepted in WACV 2021

  46. arXiv:2006.15833  [pdf, other

    eess.IV cs.CV

    End-to-End Differentiable Learning to HDR Image Synthesis for Multi-exposure Images

    Authors: Jung Hee Kim, Siyeong Lee, Suk-Ju Kang

    Abstract: Recently, high dynamic range (HDR) image reconstruction based on the multiple exposure stack from a given single exposure utilizes a deep learning framework to generate high-quality HDR images. These conventional networks focus on the exposure transfer task to reconstruct the multi-exposure stack. Therefore, they often fail to fuse the multi-exposure stack into a perceptually pleasant HDR image as… ▽ More

    Submitted 18 December, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

  47. arXiv:2006.11610  [pdf, other

    eess.AS cs.LG cs.MM cs.SD

    Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

    Authors: Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng

    Abstract: Generating 3D speech-driven talking head has received more and more attention in recent years. Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input. In this work, we propose a novel approach using phone… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

    Comments: 5 pages, 5 figures

  48. arXiv:2005.09178  [pdf, other

    eess.AS

    Transferring Source Style in Non-Parallel Voice Conversion

    Authors: Songxiang Liu, Yuewen Cao, Shiyin Kang, Na Hu, Xunying Liu, Dan Su, Dong Yu, Helen Meng

    Abstract: Voice conversion (VC) techniques aim to modify speaker identity of an utterance while preserving the underlying linguistic information. Most VC approaches ignore modeling of the speaking style (e.g. emotion and emphasis), which may contain the factors intentionally added by the speaker and should be retained during conversion. This study proposes a sequence-to-sequence based non-parallel VC approa… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

    Comments: 5 pages, 8 figures, submitted to INTERSPEECH 2020

  49. arXiv:2003.09839  [pdf, other

    eess.SY

    A New Update Rule of RLSEKF-based Joint-estimation Filters for Real-time SOH SOC Identification

    Authors: Kwangrae Kim, Minho Kim, Suwon Kang, Jungwook Yu, Jungsoo Kim, Huiyong Chun, Soohee Han

    Abstract: In order to accurately estimate the SOC and SOH of a lithium-ion battery used in an electric vehicle (EV), we propose an Adaptive Diagonal Forgetting Factor Recursive Least Square (ADFF-RLS) for accurate battery parameter estimation. ADFFRLS includes two new proposals in the existing DFF-RLS; The first is an excitation tag that changes the behavior of the DFFRLS and the EKF according to the dynami… ▽ More

    Submitted 22 March, 2020; originally announced March 2020.

  50. A Fractional-Order Normalized Bouc-Wen Model for Piezoelectric Hysteresis Nonlinearity

    Authors: Shengzheng Kang, Hongtao Wu, Yao Li, Xiaolong Yang, Jiafeng Yao

    Abstract: This paper presents a new fractional-order normalized Bouc-Wen (BW) (FONBW) model to describe the asymmetric and rate-dependent hysteresis nonlinearity of piezoelectric actuators (PEAs). In view of the fact that the classical BW (CBW) model is only efficient for the symmetric and rate-independent hysteresis description, the FONBW model is devoted to characterizing the asymmetric and rate-dependent… ▽ More

    Submitted 20 November, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: 9 pages, 10 figures, submitted to TMech; 10 pages, 11 figures, add two subsections in Section IV; modify Tables I and III, and Figures 9 and 10

    Journal ref: IEEE/ASME Transactions on Mechatronics, 2021