-
Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation
Authors:
Ye** Jeon,
Yunsu Kim,
Gary Geunbae Lee
Abstract:
Contemporary neural speech synthesis models have indeed demonstrated remarkable proficiency in synthetic speech generation as they have attained a level of quality comparable to that of human-produced speech. Nevertheless, it is important to note that these achievements have predominantly been verified within the context of high-resource languages such as English. Furthermore, the Tacotron and Fas…
▽ More
Contemporary neural speech synthesis models have indeed demonstrated remarkable proficiency in synthetic speech generation as they have attained a level of quality comparable to that of human-produced speech. Nevertheless, it is important to note that these achievements have predominantly been verified within the context of high-resource languages such as English. Furthermore, the Tacotron and FastSpeech variants show substantial pausing errors when applied to the Korean language, which affects speech perception and naturalness. In order to address the aforementioned issues, we propose a novel framework that incorporates comprehensive modeling of both syntactic and acoustic cues that are associated with pausing patterns. Remarkably, our framework possesses the capability to consistently generate natural speech even for considerably more extended and intricate out-of-domain (OOD) sentences, despite its training on short audio clips. Architectural design choices are validated through comparisons with baseline models and ablation studies using subjective and objective metrics, thus confirming model performance.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
AIC-UNet: Anatomy-informed Cascaded UNet for Robust Multi-Organ Segmentation
Authors:
Young Seok Jeon,
Hongfei Yang,
Huazhu Fu,
Mengling Feng
Abstract:
Imposing key anatomical features, such as the number of organs, their shapes, sizes, and relative positions, is crucial for building a robust multi-organ segmentation model. Current attempts to incorporate anatomical features include broadening effective receptive fields (ERF) size with resource- and data-intensive modules such as self-attention or introducing organ-specific topology regularizers,…
▽ More
Imposing key anatomical features, such as the number of organs, their shapes, sizes, and relative positions, is crucial for building a robust multi-organ segmentation model. Current attempts to incorporate anatomical features include broadening effective receptive fields (ERF) size with resource- and data-intensive modules such as self-attention or introducing organ-specific topology regularizers, which may not scale to multi-organ segmentation problems where inter-organ relation also plays a huge role. We introduce a new approach to impose anatomical constraints on any existing encoder-decoder segmentation model by conditioning model prediction with learnable anatomy prior. More specifically, given an abdominal scan, a part of the encoder spatially warps a learnable prior to align with the given input scan using thin plate spline (TPS) grid interpolation. The warped prior is then integrated during the decoding phase to guide the model for more anatomy-informed predictions. Code is available at \hyperlink{https://anonymous.4open.science/r/AIC-UNet-7048}{https://anonymous.4open.science/r/AIC-UNet-7048}.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Vector Quantization for Deep-Learning-Based CSI Feedback in Massive MIMO Systems
Authors:
Junyong Shin,
Yu** Kang,
Yo-Seb Jeon
Abstract:
This paper presents a finite-rate deep-learning (DL)-based channel state information (CSI) feedback method for massive multiple-input multiple-output (MIMO) systems. The presented method provides a finite-bit representation of the latent vector based on a vector-quantized variational autoencoder (VQ-VAE) framework while reducing its computational complexity based on shape-gain vector quantization.…
▽ More
This paper presents a finite-rate deep-learning (DL)-based channel state information (CSI) feedback method for massive multiple-input multiple-output (MIMO) systems. The presented method provides a finite-bit representation of the latent vector based on a vector-quantized variational autoencoder (VQ-VAE) framework while reducing its computational complexity based on shape-gain vector quantization. In this method, the magnitude of the latent vector is quantized using a non-uniform scalar codebook with a proper transformation function, while the direction of the latent vector is quantized using a trainable Grassmannian codebook. A multi-rate codebook design strategy is also developed by introducing a codeword selection rule for a nested codebook along with the design of a loss function. Simulation results demonstrate that the proposed method reduces the computational complexity associated with VQ-VAE while improving CSI reconstruction performance under a given feedback overhead.
△ Less
Submitted 12 March, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Deep Learning-Assisted Parallel Interference Cancellation for Grant-Free NOMA in Machine-Type Communication
Authors:
Yongjeong Oh,
Jaehong Jo,
Byonghyo Shim,
Yo-Seb Jeon
Abstract:
In this paper, we present a novel approach for joint activity detection (AD), channel estimation (CE), and data detection (DD) in uplink grant-free non-orthogonal multiple access (NOMA) systems. Our approach employs an iterative and parallel interference removal strategy inspired by parallel interference cancellation (PIC), enhanced with deep learning to jointly tackle the AD, CE, and DD problems.…
▽ More
In this paper, we present a novel approach for joint activity detection (AD), channel estimation (CE), and data detection (DD) in uplink grant-free non-orthogonal multiple access (NOMA) systems. Our approach employs an iterative and parallel interference removal strategy inspired by parallel interference cancellation (PIC), enhanced with deep learning to jointly tackle the AD, CE, and DD problems. Based on this approach, we develop three PIC frameworks, each of which is designed for either coherent or non-coherence schemes. The first framework performs joint AD and CE using received pilot signals in the coherent scheme. Building upon this framework, the second framework utilizes both the received pilot and data signals for CE, further enhancing the performances of AD, CE, and DD in the coherent scheme. The third framework is designed to accommodate the non-coherent scheme involving a small number of data bits, which simultaneously performs AD and DD. Through joint loss functions and interference cancellation modules, our approach supports end-to-end training, contributing to enhanced performances of AD, CE, and DD for both coherent and non-coherent schemes. Simulation results demonstrate the superiority of our approach over traditional techniques, exhibiting enhanced performances of AD, CE, and DD while maintaining lower computational complexity.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication
Authors:
Ye** Jeon,
Gary Geunbae Lee
Abstract:
This paper explores the task of language-agnostic speaker replication, a novel endeavor that seeks to replicate a speaker's voice irrespective of the language they are speaking. Towards this end, we introduce a multi-level attention aggregation approach that systematically probes and amplifies various speaker-specific attributes in a hierarchical manner. Through rigorous evaluations across a wide…
▽ More
This paper explores the task of language-agnostic speaker replication, a novel endeavor that seeks to replicate a speaker's voice irrespective of the language they are speaking. Towards this end, we introduce a multi-level attention aggregation approach that systematically probes and amplifies various speaker-specific attributes in a hierarchical manner. Through rigorous evaluations across a wide range of scenarios including seen and unseen speakers conversing in seen and unseen lingua, we establish that our proposed model is able to achieve substantial speaker similarity, and is able to generalize to out-of-domain (OOD) cases.
△ Less
Submitted 3 April, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Authors:
Ye** Jeon,
Yunsu Kim,
Gary Geunbae Lee
Abstract:
Zero-shot multi-speaker TTS aims to synthesize speech with the voice of a chosen target speaker without any fine-tuning. Prevailing methods, however, encounter limitations at adapting to new speakers of out-of-domain settings, primarily due to inadequate speaker disentanglement and content leakage. To overcome these constraints, we propose an innovative negation feature learning paradigm that mode…
▽ More
Zero-shot multi-speaker TTS aims to synthesize speech with the voice of a chosen target speaker without any fine-tuning. Prevailing methods, however, encounter limitations at adapting to new speakers of out-of-domain settings, primarily due to inadequate speaker disentanglement and content leakage. To overcome these constraints, we propose an innovative negation feature learning paradigm that models decoupled speaker attributes as deviations from the complete audio representation by utilizing the subtraction operation. By eliminating superfluous content information from the speaker representation, our negation scheme not only mitigates content leakage, thereby enhancing synthesis robustness, but also improves speaker fidelity. In addition, to facilitate the learning of diverse speaker attributes, we leverage multi-stream Transformers, which retain multiple hypotheses and instigate a training paradigm akin to ensemble learning. To unify these hypotheses and realize the final speaker representation, we employ attention pooling. Finally, in light of the imperative to generate target text utterances in the desired voice, we adopt adaptive layer normalizations to effectively fuse the previously generated speaker representation with the target text representations, as opposed to mere concatenation of the text and audio modalities. Extensive experiments and validations substantiate the efficacy of our proposed approach in preserving and harnessing speaker-specific attributes vis-`a-vis alternative baseline models.
△ Less
Submitted 5 March, 2024; v1 submitted 3 January, 2024;
originally announced January 2024.
-
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking
Authors:
Jihyun Lee,
Ye** Jeon,
Wonjun Lee,
Yunsu Kim,
Gary Geunbae Lee
Abstract:
Dialogue state tracking plays a crucial role in extracting information in task-oriented dialogue systems. However, preceding research are limited to textual modalities, primarily due to the shortage of authentic human audio datasets. We address this by investigating synthetic audio data for audio-based DST. To this end, we develop cascading and end-to-end models, train them with our synthetic audi…
▽ More
Dialogue state tracking plays a crucial role in extracting information in task-oriented dialogue systems. However, preceding research are limited to textual modalities, primarily due to the shortage of authentic human audio datasets. We address this by investigating synthetic audio data for audio-based DST. To this end, we develop cascading and end-to-end models, train them with our synthetic audio dataset, and test them on actual human speech data. To facilitate evaluation tailored to audio modalities, we introduce a novel PhonemeF1 to capture pronunciation similarity. Experimental results showed that models trained solely on synthetic datasets can generalize their performance to human voice data. By eliminating the dependency on human speech data collection, these insights pave the way for significant practical advancements in audio-based DST. Data and code are available at https://github.com/JihyunLee1/E2E-DST.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Prior-Aware Robust Beam Alignment for Low-SNR Millimeter-Wave Communications
Authors:
Jihun Park,
Yongjeong Oh,
Jaewon Yun,
Seonjung Kim,
Yo-Seb Jeon
Abstract:
This paper presents a robust beam alignment technique for millimeter-wave communications in low signal-to-noise ratio (SNR) environments. The core strategy of our technique is to repeatedly transmit the most probable beam candidates to reduce beam misalignment probability induced by noise. Specifically, for a given beam training overhead, both the selection of candidates and the number of repetiti…
▽ More
This paper presents a robust beam alignment technique for millimeter-wave communications in low signal-to-noise ratio (SNR) environments. The core strategy of our technique is to repeatedly transmit the most probable beam candidates to reduce beam misalignment probability induced by noise. Specifically, for a given beam training overhead, both the selection of candidates and the number of repetitions for each beam candidate are optimized based on channel prior information. To achieve this, a deep neural network is employed to learn the prior probability of the optimal beam at each location. The beam misalignment probability is then analyzed based on the channel prior, forming the basis for an optimization problem aimed at minimizing the analyzed beam misalignment probability. A closed-form solution is derived for a special case with two beam candidates, and an efficient algorithm is developed for general cases with multiple beam candidates. Simulation results using the DeepMIMO dataset demonstrate the superior performance of our technique in dynamic low-SNR communication environments when compared to existing beam alignment techniques.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset
Authors:
Yu** Jeon,
Eunsue Choi,
Youngchan Kim,
Yunseong Moon,
Khalid Omer,
Felix Heide,
Seung-Hwan Baek
Abstract:
Image datasets are essential not only in validating existing methods in computer vision but also in develo** new methods. Most existing image datasets focus on trichromatic intensity images to mimic human vision. However, polarization and spectrum, the wave properties of light that animals in harsh environments and with limited brain capacity often rely on, remain underrepresented in existing da…
▽ More
Image datasets are essential not only in validating existing methods in computer vision but also in develo** new methods. Most existing image datasets focus on trichromatic intensity images to mimic human vision. However, polarization and spectrum, the wave properties of light that animals in harsh environments and with limited brain capacity often rely on, remain underrepresented in existing datasets. Although spectro-polarimetric datasets exist, these datasets have insufficient object diversity, limited illumination conditions, linear-only polarization data, and inadequate image count. Here, we introduce two spectro-polarimetric datasets: trichromatic Stokes images and hyperspectral Stokes images. These novel datasets encompass both linear and circular polarization; they introduce multiple spectral channels; and they feature a broad selection of real-world scenes. With our dataset in hand, we analyze the spectro-polarimetric image statistics, develop efficient representations of such high-dimensional data, and evaluate spectral dependency of shape-from-polarization methods. As such, the proposed dataset promises a foundation for data-driven spectro-polarimetric imaging and vision research. Dataset and code will be publicly available.
△ Less
Submitted 30 November, 2023; v1 submitted 29 November, 2023;
originally announced November 2023.
-
Joint Source-Channel Coding for Channel-Adaptive Digital Semantic Communications
Authors:
Joohyuk Park,
Yongjeong Oh,
Seonjung Kim,
Yo-Seb Jeon
Abstract:
In this paper, we propose a novel joint source-channel coding (JSCC) approach for channel-adaptive digital semantic communications. In semantic communication systems with digital modulation and demodulation, robust design of JSCC encoder and decoder becomes challenging not only due to the unpredictable dynamics of channel conditions but also due to diverse modulation orders. To address this challe…
▽ More
In this paper, we propose a novel joint source-channel coding (JSCC) approach for channel-adaptive digital semantic communications. In semantic communication systems with digital modulation and demodulation, robust design of JSCC encoder and decoder becomes challenging not only due to the unpredictable dynamics of channel conditions but also due to diverse modulation orders. To address this challenge, we first develop a new demodulation method which assesses the uncertainty of the demodulation output to improve the robustness of the digital semantic communication system. We then devise a robust training strategy which enhances the robustness and flexibility of the JSCC encoder and decoder against diverse channel conditions and modulation orders. To this end, we model the relationship between the encoder's output and decoder's input using binary symmetric erasure channels and then sample the parameters of these channels from diverse distributions. We also develop a channel-adaptive modulation technique for an inference phase, in order to reduce the communication latency while maintaining task performance. In this technique, we adaptively determine modulation orders for the latent variables based on channel conditions. Using simulations, we demonstrate the superior performance of the proposed JSCC approach for image classification, reconstruction, and retrieval tasks compared to existing JSCC approaches.
△ Less
Submitted 18 March, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
SplitMAC: Wireless Split Learning over Multiple Access Channels
Authors:
Seonjung Kim,
Yongjeong Oh,
Yo-Seb Jeon
Abstract:
This paper presents a novel split learning (SL) framework, referred to as SplitMAC, which reduces the latency of SL by leveraging simultaneous uplink transmission over multiple access channels. The key strategy is to divide devices into multiple groups and allow the devices within the same group to simultaneously transmit their smashed data and device-side models over the multiple access channels.…
▽ More
This paper presents a novel split learning (SL) framework, referred to as SplitMAC, which reduces the latency of SL by leveraging simultaneous uplink transmission over multiple access channels. The key strategy is to divide devices into multiple groups and allow the devices within the same group to simultaneously transmit their smashed data and device-side models over the multiple access channels. The optimization problem of device grou** to minimize SL latency is formulated, and the benefit of device grou** in reducing the uplink latency of SL is theoretically derived. By examining a two-device grou** case, two asymptotically-optimal algorithms are devised for device grou** in low and high signal-to-noise ratio (SNR) scenarios, respectively, while providing proofs of their optimality. By merging these algorithms, a near-optimal device grou** algorithm is proposed to cover a wide range of SNR. Our SL framework is also extended to consider practical fading channels and to support a general group size. Simulation results demonstrate that our SL framework with the proposed device grou** algorithm is superior to existing SL frameworks in reducing SL latency.
△ Less
Submitted 19 March, 2024; v1 submitted 4 November, 2023;
originally announced November 2023.
-
Communication-Efficient Federated Learning over Capacity-Limited Wireless Networks
Authors:
Jaewon Yun,
Yongjeong Oh,
Yo-Seb Jeon,
H. Vincent Poor
Abstract:
In this paper, a communication-efficient federated learning (FL) framework is proposed for improving the convergence rate of FL under a limited uplink capacity. The central idea of the proposed framework is to transmit the values and positions of the top-$S$ entries of a local model update for uplink transmission. A lossless encoding technique is considered for transmitting the positions of these…
▽ More
In this paper, a communication-efficient federated learning (FL) framework is proposed for improving the convergence rate of FL under a limited uplink capacity. The central idea of the proposed framework is to transmit the values and positions of the top-$S$ entries of a local model update for uplink transmission. A lossless encoding technique is considered for transmitting the positions of these entries, while a linear transformation followed by the Lloyd-Max scalar quantization is considered for transmitting their values. For an accurate reconstruction of the top-$S$ values, a linear minimum mean squared error method is developed based on the Bussgang decomposition. Moreover, an error feedback strategy is introduced to compensate for both compression and reconstruction errors. The convergence rate of the proposed framework is analyzed for a non-convex loss function with consideration of the compression and reconstruction errors. From the analytical result, the key parameters of the proposed framework are optimized for maximizing the convergence rate for the given capacity. Simulation results on the MNIST and CIFAR-10 datasets demonstrate that the proposed framework outperforms state-of-the-art FL frameworks in terms of classification accuracy under the limited uplink capacity.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Neural 360$^\circ$ Structured Light with Learned Metasurfaces
Authors:
Eunsue Choi,
Gyeongtae Kim,
Jooyeong Yun,
Yu** Jeon,
Junsuk Rho,
Seung-Hwan Baek
Abstract:
Structured light has proven instrumental in 3D imaging, LiDAR, and holographic light projection. Metasurfaces, comprised of sub-wavelength-sized nanostructures, facilitate 180$^\circ$ field-of-view (FoV) structured light, circumventing the restricted FoV inherent in traditional optics like diffractive optical elements. However, extant metasurface-facilitated structured light exhibits sub-optimal p…
▽ More
Structured light has proven instrumental in 3D imaging, LiDAR, and holographic light projection. Metasurfaces, comprised of sub-wavelength-sized nanostructures, facilitate 180$^\circ$ field-of-view (FoV) structured light, circumventing the restricted FoV inherent in traditional optics like diffractive optical elements. However, extant metasurface-facilitated structured light exhibits sub-optimal performance in downstream tasks, due to heuristic pattern designs such as periodic dots that do not consider the objectives of the end application. In this paper, we present neural 360$^\circ$ structured light, driven by learned metasurfaces. We propose a differentiable framework, that encompasses a computationally-efficient 180$^\circ$ wave propagation model and a task-specific reconstructor, and exploits both transmission and reflection channels of the metasurface. Leveraging a first-order optimizer within our differentiable framework, we optimize the metasurface design, thereby realizing neural 360$^\circ$ structured light. We have utilized neural 360$^\circ$ structured light for holographic light projection and 3D imaging. Specifically, we demonstrate the first 360$^\circ$ light projection of complex patterns, enabled by our propagation model that can be computationally evaluated 50,000$\times$ faster than the Rayleigh-Sommerfeld propagation. For 3D imaging, we improve depth-estimation accuracy by 5.09$\times$ in RMSE compared to the heuristically-designed structured light. Neural 360$^\circ$ structured light promises robust 360$^\circ$ imaging and display for robotics, extended-reality systems, and human-computer interactions.
△ Less
Submitted 27 June, 2023; v1 submitted 23 June, 2023;
originally announced June 2023.
-
MIMO Detection under Hardware Impairments: Learning with Noisy Labels
Authors:
**man Kwon,
Seunghyeon Jeon,
Yo-Seb Jeon,
H. Vincent Poor
Abstract:
This paper considers a data detection problem in multiple-input multiple-output (MIMO) communication systems with hardware impairments. To address challenges posed by nonlinear and unknown distortion in received signals, two learning-based detection methods, referred to as model-driven and data-driven, are presented. The model-driven method employs a generalized Gaussian distortion model to approx…
▽ More
This paper considers a data detection problem in multiple-input multiple-output (MIMO) communication systems with hardware impairments. To address challenges posed by nonlinear and unknown distortion in received signals, two learning-based detection methods, referred to as model-driven and data-driven, are presented. The model-driven method employs a generalized Gaussian distortion model to approximate the conditional distribution of the distorted received signal. By using the outputs of coarse data detection as noisy training data, the model-driven method avoids the need for additional training overhead beyond traditional pilot overhead for channel estimation. An expectation-maximization algorithm is devised to accurately learn the parameters of the distortion model from noisy training data. To resolve a model mismatch problem in the model-driven method, the data-driven method employs a deep neural network (DNN) for approximating a-posteriori probabilities for each received signal. This method uses the outputs of the model-driven method as noisy labels and therefore does not require extra training overhead. To avoid the overfitting problem caused by noisy labels, a robust DNN training algorithm is devised, which involves a warm-up period, sample selection, and loss correction. Simulation results demonstrate that the two proposed methods outperform existing solutions with the same overhead under various hardware impairment scenarios.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
FCSN: Global Context Aware Segmentation by Learning the Fourier Coefficients of Objects in Medical Images
Authors:
Young Seok Jeon,
Hongfei Yang,
Mengling Feng
Abstract:
The encoder-decoder model is a commonly used Deep Neural Network (DNN) model for medical image segmentation. Conventional encoder-decoder models make pixel-wise predictions focusing heavily on local patterns around the pixel. This makes it challenging to give segmentation that preserves the object's shape and topology, which often requires an understanding of the global context of the object. In t…
▽ More
The encoder-decoder model is a commonly used Deep Neural Network (DNN) model for medical image segmentation. Conventional encoder-decoder models make pixel-wise predictions focusing heavily on local patterns around the pixel. This makes it challenging to give segmentation that preserves the object's shape and topology, which often requires an understanding of the global context of the object. In this work, we propose a Fourier Coefficient Segmentation Network~(FCSN) -- a novel DNN-based model that segments an object by learning the complex Fourier coefficients of the object's masks. The Fourier coefficients are calculated by integrating over the whole contour. Therefore, for our model to make a precise estimation of the coefficients, the model is motivated to incorporate the global context of the object, leading to a more accurate segmentation of the object's shape. This global context awareness also makes our model robust to unseen local perturbations during inference, such as additive noise or motion blur that are prevalent in medical images. When FCSN is compared with other state-of-the-art models (UNet+, DeepLabV3+, UNETR) on 3 medical image segmentation tasks (ISIC\_2018, RIM\_CUP, RIM\_DISC), FCSN attains significantly lower Hausdorff scores of 19.14 (6\%), 17.42 (6\%), and 9.16 (14\%) on the 3 tasks, respectively. Moreover, FCSN is lightweight by discarding the decoder module, which incurs significant computational overhead. FCSN only requires 22.2M parameters, 82M and 10M fewer parameters than UNETR and DeepLabV3+. FCSN attains inference and training speeds of 1.6ms/img and 6.3ms/img, that is 8$\times$ and 3$\times$ faster than UNet and UNETR.
△ Less
Submitted 29 July, 2022;
originally announced July 2022.
-
Interpretable Fusion Analytics Framework for fMRI Connectivity: Self-Attention Mechanism and Latent Space Item-Response Model
Authors:
Jeong-Jae Kim,
Yeseul Jeon,
SuMin Yu,
Junggu Choi,
Sanghoon Han
Abstract:
There have been several attempts to use deep learning based on brain fMRI signals to classify cognitive impairment diseases. However, deep learning is a hidden black box model that makes it difficult to interpret the process of classification. To address this issue, we propose a novel analytical framework that interprets the classification result from deep learning processes. We first derive the r…
▽ More
There have been several attempts to use deep learning based on brain fMRI signals to classify cognitive impairment diseases. However, deep learning is a hidden black box model that makes it difficult to interpret the process of classification. To address this issue, we propose a novel analytical framework that interprets the classification result from deep learning processes. We first derive the region of interest (ROI) functional connectivity network (FCN) by embedding functions based on their similar signal patterns. Then, using the self-attention equipped deep learning model, we classify diseases based on their FCN. Finally, in order to interpret the classification results, we employ a latent space item-response interaction network model to identify the significant functions that exhibit distinct connectivity patterns when compared to other diseases. The application of this proposed framework to the four types of cognitive impairment shows that our approach is valid for determining the significant ROI functions.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
MetaSSD: Meta-Learned Self-Supervised Detection
Authors:
Moon Jeong Park,
Jungseul Ok,
Yo-Seb Jeon,
Dongwoo Kim
Abstract:
Deep learning-based symbol detector gains increasing attention due to the simple algorithm design than the traditional model-based algorithms such as Viterbi and BCJR. The supervised learning framework is often employed to predict the input symbols, where training symbols are used to train the model. There are two major limitations in the supervised approaches: a) a model needs to be retrained fro…
▽ More
Deep learning-based symbol detector gains increasing attention due to the simple algorithm design than the traditional model-based algorithms such as Viterbi and BCJR. The supervised learning framework is often employed to predict the input symbols, where training symbols are used to train the model. There are two major limitations in the supervised approaches: a) a model needs to be retrained from scratch when new train symbols come to adapt to a new channel status, and b) the length of the training symbols needs to be longer than a certain threshold to make the model generalize well on unseen symbols. To overcome these challenges, we propose a meta-learning-based self-supervised symbol detector named MetaSSD. Our contribution is two-fold: a) meta-learning helps the model adapt to a new channel environment based on experience with various meta-training environments, and b) self-supervised learning helps the model to use relatively less supervision than the previously suggested learning-based detectors. In experiments, MetaSSD outperforms OFDM-MMSE with noisy channel information and shows comparable results with BCJR. Further ablation studies show the necessity of each component in our framework.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
FedVQCS: Federated Learning via Vector Quantized Compressed Sensing
Authors:
Yongjeong Oh,
Yo-Seb Jeon,
Mingzhe Chen,
Walid Saad
Abstract:
In this paper, a new communication-efficient federated learning (FL) framework is proposed, inspired by vector quantized compressed sensing. The basic strategy of the proposed framework is to compress the local model update at each device by applying dimensionality reduction followed by vector quantization. Subsequently, the global model update is reconstructed at a parameter server by applying a…
▽ More
In this paper, a new communication-efficient federated learning (FL) framework is proposed, inspired by vector quantized compressed sensing. The basic strategy of the proposed framework is to compress the local model update at each device by applying dimensionality reduction followed by vector quantization. Subsequently, the global model update is reconstructed at a parameter server by applying a sparse signal recovery algorithm to the aggregation of the compressed local model updates. By harnessing the benefits of both dimensionality reduction and vector quantization, the proposed framework effectively reduces the communication overhead of local update transmissions. Both the design of the vector quantizer and the key parameters for the compression are optimized so as to minimize the reconstruction error of the global model update under the constraint of wireless link capacity. By considering the reconstruction error, the convergence rate of the proposed framework is also analyzed for a non-convex loss function. Simulation results on the MNIST and FEMNIST datasets demonstrate that the proposed framework provides more than a 2.4% increase in classification accuracy compared to state-of-the-art FL frameworks when the communication overhead of the local model update transmission is 0.1 bit per local model entry.
△ Less
Submitted 30 June, 2023; v1 submitted 15 April, 2022;
originally announced April 2022.
-
Semi-Data-Aided Channel Estimation for MIMO Systems via Reinforcement Learning
Authors:
Tae-Kyoung Kim,
Yo-Seb Jeon,
Jun Li,
Nima Tavangaran,
H. Vincent Poor
Abstract:
Data-aided channel estimation is a promising solution to improve channel estimation accuracy by exploiting data symbols as pilot signals for updating an initial channel estimate. In this paper, we propose a semi-data-aided channel estimator for multiple-input multiple-output communication systems. Our strategy is to leverage reinforcement learning (RL) for selecting reliable detected symbols among…
▽ More
Data-aided channel estimation is a promising solution to improve channel estimation accuracy by exploiting data symbols as pilot signals for updating an initial channel estimate. In this paper, we propose a semi-data-aided channel estimator for multiple-input multiple-output communication systems. Our strategy is to leverage reinforcement learning (RL) for selecting reliable detected symbols among the symbols in the first part of transmitted data block. This strategy facilitates an update of the channel estimate before the end of data block transmission and therefore achieves a significant reduction in communication latency compared to conventional data-aided channel estimation approaches. Towards this end, we first define a Markov decision process (MDP) which sequentially decides whether to use each detected symbol as an additional pilot signal. We then develop an RL algorithm to efficiently find the best policy of the MDP based on a Monte Carlo tree search approach. In this algorithm, we exploit the a-posteriori probability for approximating both the optimal future actions and the corresponding state transitions of the MDP and derive a closed-form expression for the best policy. Simulation results demonstrate that the proposed channel estimator effectively mitigates both channel estimation error and detection performance loss caused by insufficient pilot signals.
△ Less
Submitted 3 April, 2022;
originally announced April 2022.
-
Communication-Efficient Federated Learning via Quantized Compressed Sensing
Authors:
Yongjeong Oh,
Namyoon Lee,
Yo-Seb Jeon,
H. Vincent Poor
Abstract:
In this paper, we present a communication-efficient federated learning framework inspired by quantized compressed sensing. The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server (PS). Our strategy for gradient compression is to sequentially perform block sparsification, dimensional reduction, and quantization. Thanks to grad…
▽ More
In this paper, we present a communication-efficient federated learning framework inspired by quantized compressed sensing. The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server (PS). Our strategy for gradient compression is to sequentially perform block sparsification, dimensional reduction, and quantization. Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression. For accurate aggregation of the local gradients from the compressed signals at the PS, we put forth an approximate minimum mean square error (MMSE) approach for gradient reconstruction using the expectation-maximization generalized-approximate-message-passing (EM-GAMP) algorithm. Assuming Bernoulli Gaussian-mixture prior, this algorithm iteratively updates the posterior mean and variance of local gradients from the compressed signals. We also present a low-complexity approach for the gradient reconstruction. In this approach, we use the Bussgang theorem to aggregate local gradients from the compressed signals, then compute an approximate MMSE estimate of the aggregated gradient using the EM-GAMP algorithm. We also provide a convergence rate analysis of the presented framework. Using the MNIST dataset, we demonstrate that the presented framework achieves almost identical performance with the case that performs no compression, while significantly reducing communication overhead for federated learning.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Design and Analysis of LoS MIMO Systems with Uniform Circular Arrays
Authors:
Yuri Jeon,
Gye-Tae Gil,
Yong H. Lee
Abstract:
We consider the design of a uniform circular array (UCA) based multiple-input multiple-output (MIMO) system over line-of-sight (LoS) environments in which array misalignment exists. In particular, optimal antenna placement in UCAs and transceiver architectures to achieve the maximum channel capacity without the knowledge of misalignment components are presented. To this end, we first derive a gene…
▽ More
We consider the design of a uniform circular array (UCA) based multiple-input multiple-output (MIMO) system over line-of-sight (LoS) environments in which array misalignment exists. In particular, optimal antenna placement in UCAs and transceiver architectures to achieve the maximum channel capacity without the knowledge of misalignment components are presented. To this end, we first derive a generic channel model of UCA-based LoS MIMO systems in which three misalignment factors including relative array rotation, tilting and center-shift are reflected concurrently. By factorizing the channel matrix into the singular value decomposition (SVD) form, we demonstrate that the singular values of UCA-based LoS MIMO systems are \textit{independent} of tilting and center-shift. Rather, they can be expressed as a function of the \textit{radii product-to-distance ratio} (RPDR) and the angle of relative array rotation. Numerical analyses of singular values show that the RPDR is a key design parameter of UCA systems. Based on this result, we propose an optimal design method for UCA systems which performs a one-dimensional search of RPDR to maximize channel capacity. It is observed that the channel matrix of the optimally designed UCA system is close to an orthogonal matrix; this fact allows channel capacity to be achieved by a simple zero-forcing (ZF) receiver. Additionally, we propose a low-complexity precoding scheme for UCA systems in which the optimal design criteria cannot be fulfilled because of limits on array size. The simulation results demonstrate the validity of the proposed design method and transceiver architectures.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Data-Aided Channel Estimator for MIMO Systems via Reinforcement Learning
Authors:
Yo-Seb Jeon,
Jun Li,
Nima Tavangaran,
H. Vincent Poor
Abstract:
This paper presents a data-aided channel estimator that reduces the channel estimation error of the conventional linear minimum-mean-squared-error (LMMSE) method for multiple-input multiple-output communication systems. The basic idea is to selectively exploit detected symbol vectors obtained from data detection as additional pilot signals. To optimize the selection of the detected symbol vectors,…
▽ More
This paper presents a data-aided channel estimator that reduces the channel estimation error of the conventional linear minimum-mean-squared-error (LMMSE) method for multiple-input multiple-output communication systems. The basic idea is to selectively exploit detected symbol vectors obtained from data detection as additional pilot signals. To optimize the selection of the detected symbol vectors, a Markov decision process (MDP) is defined which finds the best selection to minimize the mean-squared-error (MSE) of the channel estimate. Then a reinforcement learning algorithm is developed to solve this MDP in a computationally efficient manner. Simulation results demonstrate that the presented channel estimator significantly reduces the MSE of the channel estimate and therefore improves the block error rate of the system, compared to the conventional LMMSE method.
△ Less
Submitted 23 March, 2020;
originally announced March 2020.
-
A Compressive Sensing Approach for Federated Learning over Massive MIMO Communication Systems
Authors:
Yo-Seb Jeon,
Mohammad Mohammadi Amiri,
Jun Li,
H. Vincent Poor
Abstract:
Federated learning is a privacy-preserving approach to train a global model at a central server by collaborating with wireless devices, each with its own local training data set. In this paper, we present a compressive sensing approach for federated learning over massive multiple-input multiple-output communication systems in which the central server equipped with a massive antenna array communica…
▽ More
Federated learning is a privacy-preserving approach to train a global model at a central server by collaborating with wireless devices, each with its own local training data set. In this paper, we present a compressive sensing approach for federated learning over massive multiple-input multiple-output communication systems in which the central server equipped with a massive antenna array communicates with the wireless devices. One major challenge in system design is to reconstruct local gradient vectors accurately at the central server, which are computed-and-sent from the wireless devices. To overcome this challenge, we first establish a transmission strategy to construct sparse transmitted signals from the local gradient vectors at the devices. We then propose a compressive sensing algorithm enabling the server to iteratively find the linear minimum-mean-square-error (LMMSE) estimate of the transmitted signal by exploiting its sparsity. We also derive an analytical threshold for the residual error at each iteration, to design the stop** criterion of the proposed algorithm. We show that for a sparse transmitted signal, the proposed algorithm requires less computationally complexity than LMMSE. Simulation results demonstrate that the presented approach outperforms conventional linear beamforming approaches and reduces the performance gap between federated learning and centralized learning with perfect reconstruction.
△ Less
Submitted 5 August, 2020; v1 submitted 18 March, 2020;
originally announced March 2020.
-
Multi-Channel Volumetric Neural Network for Knee Cartilage Segmentation in Cone-beam CT
Authors:
Jennifer Maier,
Luis Carlos Rivera Monroy,
Christopher Syben,
Ye** Jeon,
Jang-Hwan Choi,
Mary Elizabeth Hall,
Marc Levenston,
Garry Gold,
Rebecca Fahrig,
Andreas Maier
Abstract:
Analyzing knee cartilage thickness and strain under load can help to further the understanding of the effects of diseases like Osteoarthritis. A precise segmentation of the cartilage is a necessary prerequisite for this analysis. This segmentation task has mainly been addressed in Magnetic Resonance Imaging, and was rarely investigated on contrast-enhanced Computed Tomography, where contrast agent…
▽ More
Analyzing knee cartilage thickness and strain under load can help to further the understanding of the effects of diseases like Osteoarthritis. A precise segmentation of the cartilage is a necessary prerequisite for this analysis. This segmentation task has mainly been addressed in Magnetic Resonance Imaging, and was rarely investigated on contrast-enhanced Computed Tomography, where contrast agent visualizes the border between femoral and tibial cartilage. To overcome the main drawback of manual segmentation, namely its high time investment, we propose to use a 3D Convolutional Neural Network for this task. The presented architecture consists of a V-Net with SeLu activation, and a Tversky loss function. Due to the high imbalance between very few cartilage pixels and many background pixels, a high false positive rate is to be expected. To reduce this rate, the two largest segmented point clouds are extracted using a connected component analysis, since they most likely represent the medial and lateral tibial cartilage surfaces. The resulting segmentations are compared to manual segmentations, and achieve on average a recall of 0.69, which confirms the feasibility of this approach.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
A fully-digital semi-rotational frequency detection algorithm for bang-bang CDRs
Authors:
Soon-Won Kwon,
Hanho Choi,
Younho Jeon,
Bong** Kim,
WooHyun Kwon,
Homin Park,
Kyeongha Kwon,
Gain Kim,
Hyeon-Min Bae
Abstract:
This work presents a new frequency acquisition method using semi-rotational frequency detection (SRFD) algorithm for a reference-less clock and data recovery (CDR) in a serial-link receiver. The proposed SRFD algorithm classifies the bang-bang phase detector(BBPD) outputs to estimate the current phase state, and detects the frequency mismatch between the input data and the sampling clock. The VCO-…
▽ More
This work presents a new frequency acquisition method using semi-rotational frequency detection (SRFD) algorithm for a reference-less clock and data recovery (CDR) in a serial-link receiver. The proposed SRFD algorithm classifies the bang-bang phase detector(BBPD) outputs to estimate the current phase state, and detects the frequency mismatch between the input data and the sampling clock. The VCO-track path in a digital loop filter (DLF) enables online calibration of a drifted frequency of VCO caused by temperature or voltage variation after a frequency acquisition. The proposed algorithm can be implemented as a digitally-synthesized circuit, lowering design efforts for referenceless CDRs. A 10 Gbps transceiver IC with the proposed algorithm, fabricated in a 65nm CMOS process, demonstrates successful recovery of the input phase without any reference clock.
△ Less
Submitted 1 May, 2019;
originally announced May 2019.
-
Robust Data Detection for MIMO Systems with One-Bit ADCs: A Reinforcement Learning Approach
Authors:
Yo-Seb Jeon,
Namyoon Lee,
H. Vincent Poor
Abstract:
The use of one-bit analog-to-digital converters (ADCs) at a receiver is a power-efficient solution for future wireless systems operating with a large signal bandwidth and/or a massive number of receive radio frequency chains. This solution, however, induces a high channel estimation error and therefore makes it difficult to perform the optimal data detection that requires perfect knowledge of like…
▽ More
The use of one-bit analog-to-digital converters (ADCs) at a receiver is a power-efficient solution for future wireless systems operating with a large signal bandwidth and/or a massive number of receive radio frequency chains. This solution, however, induces a high channel estimation error and therefore makes it difficult to perform the optimal data detection that requires perfect knowledge of likelihood functions at the receiver. In this paper, we propose a likelihood function learning method for multiple-input multiple-output (MIMO) systems with one-bit ADCs using a reinforcement learning approach. The key idea is to exploit input-output samples obtained from data detection, to compensate the mismatch in the likelihood function. The underlying difficulty of this idea is a label uncertainty in the samples caused by a data detection error. To resolve this problem, we define a Markov decision process (MDP) to maximize the accuracy of the likelihood function learned from the samples. We then develop a reinforcement learning algorithm that efficiently finds the optimal policy by approximating the transition function and the optimal state of the MDP. Simulation results demonstrate that the proposed method provides significant performance gains for the optimal data detection methods that suffer from the mismatch in the likelihood function.
△ Less
Submitted 29 March, 2019;
originally announced March 2019.