-
BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification
Authors:
June-Woo Kim,
Miika Toikkanen,
Yera Choi,
Seoung-Eun Moon,
Ho-Young Jung
Abstract:
Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model u…
▽ More
Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata which includes the gender and age of patients, type of recording devices, and recording location on the patient's body. Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%. This result validates the effectiveness of leveraging metadata and respiratory sound samples in enhancing RSC performance. Additionally, we investigate the model performance in the case where metadata is partially unavailable, which may occur in real-world clinical setting.
△ Less
Submitted 14 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
Authors:
June-Woo Kim,
Miika Toikkanen,
Sangmin Bae,
Minseok Kim,
Ho-Young Jung
Abstract:
Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrain…
▽ More
Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrained speech models for respiratory sound classification. We find that there is a characterization gap between speech and lung sound samples, and to bridge this gap, data augmentation is essential. However, the most widely used augmentation technique for audio and speech, SpecAugment, requires 2-dimensional spectrogram format and cannot be applied to models pretrained on speech waveforms. To address this, we propose RepAugment, an input-agnostic representation-level augmentation technique that outperforms SpecAugment, but is also suitable for respiratory sound classification with waveform pretrained models. Experimental results show that our approach outperforms the SpecAugment, demonstrating a substantial improvement in the accuracy of minority disease classes, reaching up to 7.14%.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification
Authors:
June-Woo Kim,
Sangmin Bae,
Won-Yang Cho,
Byungjo Lee,
Ho-Young Jung
Abstract:
Despite the remarkable advances in deep learning technology, achieving satisfactory performance in lung sound classification remains a challenge due to the scarcity of available data. Moreover, the respiratory sound samples are collected from a variety of electronic stethoscopes, which could potentially introduce biases into the trained models. When a significant distribution shift occurs within t…
▽ More
Despite the remarkable advances in deep learning technology, achieving satisfactory performance in lung sound classification remains a challenge due to the scarcity of available data. Moreover, the respiratory sound samples are collected from a variety of electronic stethoscopes, which could potentially introduce biases into the trained models. When a significant distribution shift occurs within the test dataset or in a practical scenario, it can substantially decrease the performance. To tackle this issue, we introduce cross-domain adaptation techniques, which transfer the knowledge from a source domain to a distinct target domain. In particular, by considering different stethoscope types as individual domains, we propose a novel stethoscope-guided supervised contrastive learning approach. This method can mitigate any domain-related disparities and thus enables the model to distinguish respiratory sounds of the recording variation of the stethoscope. The experimental results on the ICBHI dataset demonstrate that the proposed methods are effective in reducing the domain dependency and achieving the ICBHI Score of 61.71%, which is a significant improvement of 2.16% over the baseline.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance
Authors:
June-Woo Kim,
Chihyeon Yoon,
Miika Toikkanen,
Sangmin Bae,
Ho-Young Jung
Abstract:
Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential data like respiratory sounds is less explored. In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder. We also demonstrate a simple yet effective a…
▽ More
Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential data like respiratory sounds is less explored. In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder. We also demonstrate a simple yet effective adversarial fine-tuning method to align features between the synthetic and real respiratory sound samples to improve respiratory sound classification performance. Our experimental results on the ICBHI dataset demonstrate that the proposed adversarial fine-tuning is effective, while only using the conventional augmentation method shows performance degradation. Moreover, our method outperforms the baseline by 2.24% on the ICBHI Score and improves the accuracy of the minority classes up to 26.58%. For the supplementary material, we provide the code at https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Semantic Map Guided Synthesis of Wireless Capsule Endoscopy Images using Diffusion Models
Authors:
Hae** Lee,
Jeongwoo Ju,
Jonghyuck Lee,
Yeoun Joo Lee,
Heechul Jung
Abstract:
Wireless capsule endoscopy (WCE) is a non-invasive method for visualizing the gastrointestinal (GI) tract, crucial for diagnosing GI tract diseases. However, interpreting WCE results can be time-consuming and tiring. Existing studies have employed deep neural networks (DNNs) for automatic GI tract lesion detection, but acquiring sufficient training examples, particularly due to privacy concerns, r…
▽ More
Wireless capsule endoscopy (WCE) is a non-invasive method for visualizing the gastrointestinal (GI) tract, crucial for diagnosing GI tract diseases. However, interpreting WCE results can be time-consuming and tiring. Existing studies have employed deep neural networks (DNNs) for automatic GI tract lesion detection, but acquiring sufficient training examples, particularly due to privacy concerns, remains a challenge. Public WCE databases lack diversity and quantity. To address this, we propose a novel approach leveraging generative models, specifically the diffusion model (DM), for generating diverse WCE images. Our model incorporates semantic map resulted from visualization scale (VS) engine, enhancing the controllability and diversity of generated images. We evaluate our approach using visual inspection and visual Turing tests, demonstrating its effectiveness in generating realistic and diverse WCE images.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Model-Free Reconstruction of Capacity Degradation Trajectory of Lithium-Ion Batteries Using Early Cycle Data
Authors:
Seongyoon Kim,
Hangsoon Jung,
Minho Lee,
Yun Young Choi,
Jung-Il Choi
Abstract:
Early degradation prediction of lithium-ion batteries is crucial for ensuring safety and preventing unexpected failure in manufacturing and diagnostic processes. Long-term capacity trajectory predictions can fail due to cumulative errors and noise. To address this issue, this study proposes a data-centric method that uses early single-cycle data to predict the capacity degradation trajectory of li…
▽ More
Early degradation prediction of lithium-ion batteries is crucial for ensuring safety and preventing unexpected failure in manufacturing and diagnostic processes. Long-term capacity trajectory predictions can fail due to cumulative errors and noise. To address this issue, this study proposes a data-centric method that uses early single-cycle data to predict the capacity degradation trajectory of lithium-ion cells. The method involves predicting a few knots at specific retention levels using a deep learning-based model and interpolating them to reconstruct the trajectory. Two approaches are used to identify the retention levels of two to four knots: uniformly dividing the retention up to the end of life and finding optimal locations using Bayesian optimization. The proposed model is validated with experimental data from 169 cells using five-fold cross-validation. The results show that mean absolute percentage errors in trajectory prediction are less than 1.60% for all cases of knots. By predicting only the cycle numbers of at least two knots based on early single-cycle charge and discharge data, the model can directly estimate the overall capacity degradation trajectory. Further experiments suggest using three-cycle input data to achieve robust and efficient predictions, even in the presence of noise. The method is then applied to predict various shapes of capacity degradation patterns using additional experimental data from 82 cells. The study demonstrates that collecting only the cycle information of a few knots during model training and a few early cycle data points for predictions is sufficient for predicting capacity degradation. This can help establish appropriate warranties or replacement cycles in battery manufacturing and diagnosis processes.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
Denoising single images by feature ensemble revisited
Authors:
Masud An Nur Islam Fahim,
Nazmus Saqib,
Shafkat Khan Siam,
Ho Yub Jung
Abstract:
Image denoising is still a challenging issue in many computer vision sub-domains. Recent studies show that significant improvements are made possible in a supervised setting. However, few challenges, such as spatial fidelity and cartoon-like smoothing remain unresolved or decisively overlooked. Our study proposes a simple yet efficient architecture for the denoising problem that addresses the afor…
▽ More
Image denoising is still a challenging issue in many computer vision sub-domains. Recent studies show that significant improvements are made possible in a supervised setting. However, few challenges, such as spatial fidelity and cartoon-like smoothing remain unresolved or decisively overlooked. Our study proposes a simple yet efficient architecture for the denoising problem that addresses the aforementioned issues. The proposed architecture revisits the concept of modular concatenation instead of long and deeper cascaded connections, to recover a cleaner approximation of the given image. We find that different modules can capture versatile representations, and concatenated representation creates a richer subspace for low-level image restoration. The proposed architecture's number of parameters remains smaller than the number for most of the previous networks and still achieves significant improvements over the current state-of-the-art networks.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
STAR-RIS-Assisted Hybrid NOMA mmWave Communication: Optimization and Performance Analysis
Authors:
Muhammad Faraz Ul Abrar,
Muhammad Talha,
Rafay Iqbal Ansari,
Syed Ali Hassan,
Haejoon Jung
Abstract:
Simultaneously reflecting and transmitting reconfigurable intelligent surfaces (STAR-RIS) has recently emerged as prominent technology that exploits the transmissive property of RIS to mitigate the half-space coverage limitation of conventional RIS operating on millimeter-wave (mmWave). In this paper, we study a downlink STAR-RIS-based multi-user multiple-input single-output (MU-MISO) mmWave hybri…
▽ More
Simultaneously reflecting and transmitting reconfigurable intelligent surfaces (STAR-RIS) has recently emerged as prominent technology that exploits the transmissive property of RIS to mitigate the half-space coverage limitation of conventional RIS operating on millimeter-wave (mmWave). In this paper, we study a downlink STAR-RIS-based multi-user multiple-input single-output (MU-MISO) mmWave hybrid non-orthogonal multiple access (H-NOMA) wireless network, where a sum-rate maximization problem has been formulated. The design of active and passive beamforming vectors, time and power allocation for H-NOMA is a highly coupled non-convex problem. To handle the problem, we propose an optimization framework based on alternating optimization (AO) that iteratively solves active and passive beamforming sub-problems. Channel correlations and channel strength-based techniques have been proposed for a specific case of two-user optimal clustering and decoding order assignment, respectively, for which analytical solutions to joint power and time allocation for H-NOMA have also been derived. Simulation results show that: 1) the proposed framework leveraging H-NOMA outperforms conventional OMA and NOMA to maximize the achievable sum-rate; 2) using the proposed framework, the supported number of clusters for the given design constraints can be increased considerably; 3) through STAR-RIS, the number of elements can be significantly reduced as compared to conventional RIS to ensure a similar quality-of-service (QoS).
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Offline RL With Resource Constrained Online Deployment
Authors:
Jayanth Reddy Regatti,
Aniket Anand Deshmukh,
Frank Cheng,
Young Hun Jung,
Abhishek Gupta,
Urun Dogan
Abstract:
Offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible. As a natural consequence of these harsh conditions, an agent may lack the resources to fully observe the online environment before taking an action. We dub this situation the resource-constrained setting. This leads to situations where the offline dataset (ava…
▽ More
Offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible. As a natural consequence of these harsh conditions, an agent may lack the resources to fully observe the online environment before taking an action. We dub this situation the resource-constrained setting. This leads to situations where the offline dataset (available for training) can contain fully processed features (using powerful language models, image models, complex sensors, etc.) which are not available when actions are actually taken online. This disconnect leads to an interesting and unexplored problem in offline RL: Is it possible to use a richly processed offline dataset to train a policy which has access to fewer features in the online environment? In this work, we introduce and formalize this novel resource-constrained problem setting. We highlight the performance gap between policies trained using the full offline dataset and policies trained using limited features. We address this performance gap with a policy transfer algorithm which first trains a teacher agent using the offline dataset where features are fully available, and then transfers this knowledge to a student agent that only uses the resource-constrained features. To better capture the challenge of this setting, we propose a data collection procedure: Resource Constrained-Datasets for RL (RC-D4RL). We evaluate our transfer algorithm on RC-D4RL and the popular D4RL benchmarks and observe consistent improvement over the baseline (TD3+BC without transfer). The code for the experiments is available at https://github.com/JayanthRR/RC-OfflineRL.
△ Less
Submitted 7 December, 2021; v1 submitted 6 October, 2021;
originally announced October 2021.
-
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Authors:
Ji-Hoon Kim,
Sang-Hoon Lee,
Ji-Hyun Lee,
Hong-Gyu Jung,
Seong-Whan Lee
Abstract:
Few-shot speaker adaptation is a specific Text-to-Speech (TTS) system that aims to reproduce a novel speaker's voice with a few training data. While numerous attempts have been made to the few-shot speaker adaptation system, there is still a gap in terms of speaker similarity to the target speaker depending on the amount of data. To bridge the gap, we propose GC-TTS which achieves high-quality spe…
▽ More
Few-shot speaker adaptation is a specific Text-to-Speech (TTS) system that aims to reproduce a novel speaker's voice with a few training data. While numerous attempts have been made to the few-shot speaker adaptation system, there is still a gap in terms of speaker similarity to the target speaker depending on the amount of data. To bridge the gap, we propose GC-TTS which achieves high-quality speaker adaptation with significantly improved speaker similarity. Specifically, we leverage two geometric constraints to learn discriminative speaker representations. Here, a TTS model is pre-trained for base speakers with a sufficient amount of data, and then fine-tuned for novel speakers on a few minutes of data with two geometric constraints. Two geometric constraints enable the model to extract discriminative speaker embeddings from limited data, which leads to the synthesis of intelligible speech. We discuss and verify the effectiveness of GC-TTS by comparing it with popular and essential methods. The experimental results demonstrate that GC-TTS generates high-quality speech from only a few minutes of training data, outperforming standard techniques in terms of speaker similarity to the target speaker.
△ Less
Submitted 16 August, 2021;
originally announced August 2021.
-
Backhaul-Aware Intelligent Positioning of UAVs and Association of Terrestrial Base Stations for Fronthaul Connectivity
Authors:
Muhammad K. Shehzad,
Arsalan Ahmad,
Syed Ali Hassan,
Haejoon Jung
Abstract:
The mushroom growth of cellular users requires novel advancements in the existing cellular infrastructure. One way to handle such a tremendous increase is to densely deploy terrestrial small-cell base stations (TSBSs) with careful management of smart backhaul/fronthaul networks. Nevertheless, terrestrial backhaul hubs significantly suffer from the dense fading environment and are difficult to inst…
▽ More
The mushroom growth of cellular users requires novel advancements in the existing cellular infrastructure. One way to handle such a tremendous increase is to densely deploy terrestrial small-cell base stations (TSBSs) with careful management of smart backhaul/fronthaul networks. Nevertheless, terrestrial backhaul hubs significantly suffer from the dense fading environment and are difficult to install in a typical urban environment. Therefore, this paper considers the idea of replacing terrestrial backhaul network with an aerial network consisting of unmanned aerial vehicles (UAVs) to provide the fronthaul connectivity between the TSBSs and the ground core-network (GCN). To this end, we focus on the joint positioning of UAVs and the association of TSBSs such that the sum-rate of the overall system is maximized. In particular, the association problem of TSBSs with UAVs is formulated under communication-related constraints, i.e., bandwidth, number of connections to a UAV, power limit, interference threshold, UAV heights, and backhaul data rate. To meet this joint objective, we take advantage of the genetic algorithm (GA) due to the offline nature of our optimization problem. The performance of the proposed approach is evaluated using the unsupervised learning-based k-means clustering algorithm. We observe that the proposed approach is highly effective to satisfy the requirements of smart fronthaul networks.
△ Less
Submitted 1 May, 2021;
originally announced May 2021.
-
Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs
Authors:
Sujeong Cha,
Wangrui Hou,
Hyun Jung,
My Phung,
Michael Picheny,
Hong-Kwang Kuo,
Samuel Thomas,
Edmilson Morais
Abstract:
A major focus of recent research in spoken language understanding (SLU) has been on the end-to-end approach where a single model can predict intents directly from speech inputs without intermediate transcripts. However, this approach presents some challenges. First, since speech can be considered as personally identifiable information, in some cases only automatic speech recognition (ASR) transcri…
▽ More
A major focus of recent research in spoken language understanding (SLU) has been on the end-to-end approach where a single model can predict intents directly from speech inputs without intermediate transcripts. However, this approach presents some challenges. First, since speech can be considered as personally identifiable information, in some cases only automatic speech recognition (ASR) transcripts are accessible. Second, intent-labeled speech data is scarce. To address the first challenge, we propose a novel system that can predict intents from flexible types of inputs: speech, ASR transcripts, or both. We demonstrate strong performance for either modality separately, and when both speech and ASR transcripts are available, through system combination, we achieve better results than using a single input modality. To address the second challenge, we leverage a semantically robust pre-trained BERT model and adopt a cross-modal system that co-trains text embeddings and acoustic embeddings in a shared latent space. We further enhance this system by utilizing an acoustic module pre-trained on LibriSpeech and domain-adapting the text module on our target datasets. Our experiments show significant advantages for these pre-training and fine-tuning strategies, resulting in a system that achieves competitive intent-classification performance on Snips SLU and Fluent Speech Commands datasets.
△ Less
Submitted 14 June, 2021; v1 submitted 7 April, 2021;
originally announced April 2021.
-
A Vision of XR-aided Teleoperation System Towards 5G/B5G
Authors:
Fenghe Hu,
Yansha Deng,
Hui Zhou,
Tae Hun Jung,
Chan-Byoung Chae,
A. Hamid Aghvami
Abstract:
Extended Reality (XR)-aided teleoperation has shown its potential in improving operating efficiency in mission-critical, rich-information and complex scenarios. The multi-sensory XR devices introduce several new types of traffic with unique quality-of-service (QoS) requirements, which are usually defined by three measures---human perception, corresponding sensors, and present devices. To fulfil th…
▽ More
Extended Reality (XR)-aided teleoperation has shown its potential in improving operating efficiency in mission-critical, rich-information and complex scenarios. The multi-sensory XR devices introduce several new types of traffic with unique quality-of-service (QoS) requirements, which are usually defined by three measures---human perception, corresponding sensors, and present devices. To fulfil these requirements, cellular-supported wireless connectivity can be a promising solution that can largely benefit the Robot-to-XR and the XR-to-Robot links. In this article, we present industrial and piloting use cases and identify the service bottleneck of each case. We then cover the QoS of Robot-XR and XR-Robot links by summarizing the sensors' parameters and processing procedures. To realise these use cases, we introduce potential solutions for each case with cellular connections. Finally, we build testbeds to investigate the effectiveness of supporting our proposed links using current wireless topologies.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
The Shift to 6G Communications: Vision and Requirements
Authors:
Muhammad Waseem Akhtar,
Syed Ali Hassan,
Rizwan Ghaffar,
Haejoon Jung,
Sahil Garg,
M. Shamim Hossain
Abstract:
The sixth-generation (6G) wireless communication network is expected to integrate the terrestrial, aerial, and maritime communications into a robust network which would be more reliable, fast, and can support a massive number of devices with ultra-low latency requirements. The researchers around the globe are proposing cutting edge technologies such as artificial intelligence (AI)/machine learning…
▽ More
The sixth-generation (6G) wireless communication network is expected to integrate the terrestrial, aerial, and maritime communications into a robust network which would be more reliable, fast, and can support a massive number of devices with ultra-low latency requirements. The researchers around the globe are proposing cutting edge technologies such as artificial intelligence (AI)/machine learning (ML), quantum communication/quantum machine learning (QML), blockchain, tera-Hertz and millimeter waves communication, tactile Internet, non-orthogonal multiple access (NOMA), small cells communication, fog/edge computing, etc., as the key technologies in the realization of beyond 5G (B5G) and 6G communications. In this article, we provide a detailed overview of the 6G network dimensions with air interface and associated potential technologies. More specifically, we highlight the use cases and applications of the proposed 6G networks in various dimensions. Furthermore, we also discuss the key performance indicators (KPI) for the B5G/6G network, challenges, and future research opportunities in this domain.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Deep Learning Methods for Lung Cancer Segmentation in Whole-slide Histopathology Images -- the ACDC@LungHP Challenge 2019
Authors:
Zhang Li,
Jiehua Zhang,
Tao Tan,
Xichao Teng,
Xiaoliang Sun,
Yang Li,
Lihong Liu,
Yang Xiao,
Byungjae Lee,
Yilong Li,
Qianni Zhang,
Shujiao Sun,
Yushan Zheng,
Junyu Yan,
Ni Li,
Yiyu Hong,
Junsu Ko,
Hyun Jung,
Yanling Liu,
Yu-cheng Chen,
Ching-wei Wang,
Vladimir Yurovskiy,
Pavel Maevskikh,
Vahid Khanagha,
Yi Jiang
, et al. (8 additional authors not shown)
Abstract:
Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection)…
▽ More
Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection) of cancer tissue in whole slide imaging (WSI), using an annotated dataset of 150 training images and 50 test images from 200 patients. This paper reviews this challenge and summarizes the top 10 submitted methods for lung cancer segmentation. All methods were evaluated using the false positive rate, false negative rate, and DICE coefficient (DC). The DC ranged from 0.7354$\pm$0.1149 to 0.8372$\pm$0.0858. The DC of the best method was close to the inter-observer agreement (0.8398$\pm$0.0890). All methods were based on deep learning and categorized into two groups: multi-model method and single model method. In general, multi-model methods were significantly better ($\textit{p}$<$0.01$) than single model methods, with mean DC of 0.7966 and 0.7544, respectively. Deep learning based methods could potentially help pathologists find suspicious regions for further analysis of lung cancer in WSI.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Energy Efficiency and Hover Time Optimization in UAV-based HetNets
Authors:
S. T. Muntaha,
S. A. Hassan,
H. Jung,
M. S. Hossain
Abstract:
In this paper, we investigate the downlink performance of a three-tier heterogeneous network (HetNet). The objective is to enhance the edge capacity of a macro cell by deploying unmanned aerial vehicles (UAVs) as flying base stations and small cells (SCs) for improving the capacity of indoor users in scenarios such as temporary hotspot regions or during disaster situations where the terrestrial ne…
▽ More
In this paper, we investigate the downlink performance of a three-tier heterogeneous network (HetNet). The objective is to enhance the edge capacity of a macro cell by deploying unmanned aerial vehicles (UAVs) as flying base stations and small cells (SCs) for improving the capacity of indoor users in scenarios such as temporary hotspot regions or during disaster situations where the terrestrial network is either insufficient or out of service. UAVs are energy-constrained devices with a limited flight time, therefore, we formulate a two layer optimization scheme, where we first optimize the power consumption of each tier for enhancing the system energy efficiency (EE) under a minimum quality-of-service (QoS) requirement, which is followed by optimizing the average hover time of UAVs. We obtain the solution to these nonlinear constrained optimization problems by first utilizing the Lagrange multipliers method and then implementing a sub-gradient approach for obtaining convergence. The results show that through optimal power allocation, the system EE improves significantly in comparison to when maximum power is allocated to users (ground cellular users or connected vehicles). The hover time optimization results in increased flight time of UAVs thus providing service for longer durations.
△ Less
Submitted 29 July, 2020; v1 submitted 28 July, 2020;
originally announced July 2020.
-
A Drone-Aided Blockchain-Based Smart Vehicular Network
Authors:
Muhammad Asaad Cheema,
Muhammad Karam Shehzad,
Hassaan Khaliq Qureshi,
Syed Ali Hassan,
Haejoon Jung
Abstract:
The staggering growth of the number of vehicles worldwide has become a critical challenge resulting in tragic incidents, environment pollution, congestion, etc. Therefore, one of the promising approaches is to design a smart vehicular system as it is beneficial to drive safely. Present vehicular system lacks data reliability, security, and easy deployment. Motivated by these issues, this paper add…
▽ More
The staggering growth of the number of vehicles worldwide has become a critical challenge resulting in tragic incidents, environment pollution, congestion, etc. Therefore, one of the promising approaches is to design a smart vehicular system as it is beneficial to drive safely. Present vehicular system lacks data reliability, security, and easy deployment. Motivated by these issues, this paper addresses a drone-enabled intelligent vehicular system, which is secure, easy to deploy and reliable in quality. Nevertheless, an increase in the number of operating drones in the communication networks makes them more vulnerable towards the cyber-attacks, which can completely sabotage the communication infrastructure. To tackle these problems, we propose a blockchain-based registration and authentication system for the entities such as drones, smart vehicles (SVs) and roadside units (RSUs). This paper is mainly focused on the blockchain-based secure system design and the optimal placement of drones to improve the spectral efficiency of the overall network. In particular, we investigate the association of RSUs with the drones by considering multiple communication-related factors such as available bandwidth, maximum number of links a drone can support, and backhaul limitations. We show that the proposed model can easily be overlaid on the current vehicular network rea** benefits of secure and reliable communications.
△ Less
Submitted 25 July, 2020;
originally announced July 2020.
-
STBC-Aided Cooperative NOMA with Timing Offsets, Imperfect Successive Interference Cancellation, and Imperfect Channel State Information
Authors:
Muhammad Waseem Akhtar,
Syed Ali Hassan,
Sajid Saleem,
Haejoon Jung
Abstract:
The combination of non-orthogonal multiple access(NOMA) and cooperative communications can be a suitable solution for fifth generation (5G) and beyond 5G (B5G) wireless systems with massive connectivity, because it can provide higher spectral efficiency, lower energy consumption, and improved fairness compared to the non-cooperative NOMA. However,the receiver complexity in the conventional coopera…
▽ More
The combination of non-orthogonal multiple access(NOMA) and cooperative communications can be a suitable solution for fifth generation (5G) and beyond 5G (B5G) wireless systems with massive connectivity, because it can provide higher spectral efficiency, lower energy consumption, and improved fairness compared to the non-cooperative NOMA. However,the receiver complexity in the conventional cooperative NOMA increases with increasing number of users owing to successive interference cancellation (SIC) at each user. Space time block code-aided cooperative NOMA (STBC-CNOMA) offers less numbers of SIC as compared to that of conventional cooperative NOMA. In this paper, we evaluate the performance of STBC-CNOMA under practical challenges such as imperfect SIC, imperfect timing synchronization between distributed cooperating users, and imperfect channel state information (CSI). We derive closed-form expressions of the received signals in the presence of such realistic impairments and then use them to evaluate outage probability. Further, we provide intuitive insights into the impact of each impairment on the outage performance through asymptotic analysis at high transmit signal-to-noise ratio. We also compare the complexity of STBC-CNOMA with existing cooperative NOMA protocols for a given number of users. In addition, through analysis and simulation, we observe that the impact of the imperfect SIC on the outage performance of STBC-CNOMA is more significant compared to the other two imperfections. Therefore, considering the smaller number of SIC in STBC-CNOMA compared to the other cooperative NOMA protocols, STBC-CNOMA is an effective solution to achieve high reliability for the same SIC imperfection condition.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
Performance Analysis of Backscatter Communication Systems with Non-orthogonal Multiple Access in Nakagami Fading Channels
Authors:
Ahsan Waleed Nazar,
Syed Ali Hassan,
Haejoon Jung,
Aamir Mahmood,
Mikael Gidlund
Abstract:
Backscatter communication (BackCom) has been emerging as a prospective candidate in tackling lifetime management problems for massively deployed Internet-of-Things devices, which suffer from battery-related issues, i.e., replacements, charging, and recycling. This passive sensing approach allows a backscatter sensor node (BSN) to transmit information by reflecting the incident signal from a carrie…
▽ More
Backscatter communication (BackCom) has been emerging as a prospective candidate in tackling lifetime management problems for massively deployed Internet-of-Things devices, which suffer from battery-related issues, i.e., replacements, charging, and recycling. This passive sensing approach allows a backscatter sensor node (BSN) to transmit information by reflecting the incident signal from a carrier emitter without initiating its transmission. To multiplex multiple BSNs, power-domain non-orthogonal multiple access (NOMA), which is a prime candidate for multiple access in beyond 5G systems, is fully exploited in this work. Recently, considerable attention has been devoted to the NOMA-aided BackCom networks in the context of outage probabilities and system throughput. However, the closed-form expressions of bit error rate (BER) for such a system have not been studied. In this paper, we present the design and analysis of a NOMA enhanced bistatic BackCom system for a battery-less smart communication paradigm. Specifically, we derive the closed-form BER expressions for a cluster of two devices in a bistatic BackCom system employing NOMA with imperfect successive interference cancellation under Nakagami-$m$ fading channel. The obtained expressions are utilized to evaluate the reflection coefficients of devices needed for the most favorable system performance. Our results also show that NOMA-BackCom achieves better data throughput compared to the orthogonal multiple access-time domain multiple access schemes (OMA-TDMA).
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Wireless VR/Haptic Open Platform for Multimodal Teleoperation
Authors:
Tae Hun Jung,
Hanju Yoo,
Yuna **,
Chae Eun Rhee,
Chan-Byoung Chae
Abstract:
With emerging trends in the fifth generation and robotics, the Internet of Skills will enable us to deliver skills or expertise anywhere over the Internet. In this paper, we propose a wireless connected virtual reality and haptic communication open platform to show the proof of concept for multimodal teleoperation systems in real-time. We focus on a practical implementation with commercial product…
▽ More
With emerging trends in the fifth generation and robotics, the Internet of Skills will enable us to deliver skills or expertise anywhere over the Internet. In this paper, we propose a wireless connected virtual reality and haptic communication open platform to show the proof of concept for multimodal teleoperation systems in real-time. We focus on a practical implementation with commercial products to facilitate the access and modification of the system. The performance of the system is measured in terms of system latency and user-centric metrics.
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
Vocoder-free End-to-End Voice Conversion with Transformer Network
Authors:
June-Woo Kim,
Ho-Young Jung,
Minho Lee
Abstract:
Mel-frequency filter bank (MFB) based approaches have the advantage of learning speech compared to raw spectrum since MFB has less feature size. However, speech generator with MFB approaches require additional vocoder that needs a huge amount of computation expense for training process. The additional pre/post processing such as MFB and vocoder is not essential to convert real human speech to othe…
▽ More
Mel-frequency filter bank (MFB) based approaches have the advantage of learning speech compared to raw spectrum since MFB has less feature size. However, speech generator with MFB approaches require additional vocoder that needs a huge amount of computation expense for training process. The additional pre/post processing such as MFB and vocoder is not essential to convert real human speech to others. It is possible to only use the raw spectrum along with the phase to generate different style of voices with clear pronunciation. In this regard, we propose a fast and effective approach to convert realistic voices using raw spectrum in a parallel manner. Our transformer-based model architecture which does not have any CNN or RNN layers has shown the advantage of learning fast and solved the limitation of sequential computation of conventional RNN. In this paper, we introduce a vocoder-free end-to-end voice conversion method using transformer network. The presented conversion model can also be used in speaker adaptation for speech recognition. Our approach can convert the source voice to a target voice without using MFB and vocoder. We can get an adapted MFB for speech recognition by multiplying the converted magnitude with phase. We perform our voice conversion experiments on TIDIGITS dataset using the metrics such as naturalness, similarity, and clarity with mean opinion score, respectively.
△ Less
Submitted 5 February, 2020;
originally announced February 2020.
-
Design of Anti-Jamming Waveforms for Time-Hop** Spread Spectrum Systems in Tone Jamming Environments
Authors:
Hyoyoung Jung,
Binh Van Nguyen,
Iickho Song,
Kiseon Kim
Abstract:
We consider the problem of designing waveforms for mitigating single tone jamming (STJ) signals with an estimated jamming frequency in time-hop** spread spectrum (TH SS) systems. The proposed design of waveforms optimizes the anti-jamming (AJ) performance of TH SS systems by minimizing the correlation between the template and STJ signals, in which the problem of waveform optimization is simplifi…
▽ More
We consider the problem of designing waveforms for mitigating single tone jamming (STJ) signals with an estimated jamming frequency in time-hop** spread spectrum (TH SS) systems. The proposed design of waveforms optimizes the anti-jamming (AJ) performance of TH SS systems by minimizing the correlation between the template and STJ signals, in which the problem of waveform optimization is simplified by employing a finite number of rectangular pulses. The simplification eventually makes the design of waveforms be converted into a problem of finding eigenvalues and eigenvectors of a matrix. Simulation results show that the waveforms designed by the proposed scheme provide us with
performance superior not only to the conventional waveforms but also to the clipper receiver in the mitigation of STJ. The waveforms from the proposed design also exhibit a desirable AJ capability even when the estimated frequency of the STJ is not perfect.
△ Less
Submitted 24 November, 2019;
originally announced November 2019.
-
Extension of Convolutional Neural Network with General Image Processing Kernels
Authors:
Jay Hoon Jung,
Yousun Shin,
YoungMin Kwon
Abstract:
We applied pre-defined kernels also known as filters or masks developed for image processing to convolution neural network. Instead of letting neural networks find its own kernels, we used 41 different general-purpose kernels of blurring, edge detecting, sharpening, discrete cosine transformation, etc. for the first layer of the convolution neural networks. This architecture, thus named as general…
▽ More
We applied pre-defined kernels also known as filters or masks developed for image processing to convolution neural network. Instead of letting neural networks find its own kernels, we used 41 different general-purpose kernels of blurring, edge detecting, sharpening, discrete cosine transformation, etc. for the first layer of the convolution neural networks. This architecture, thus named as general filter convolutional neural network (GFNN), can reduce training time by 30% with a better accuracy compared to the regular convolutional neural network (CNN). GFNN also can be trained to achieve 90% accuracy with only 500 samples. Furthermore, even though these kernels are not specialized for the MNIST dataset, we achieved 99.56% accuracy without ensemble nor any other special algorithms.
△ Less
Submitted 16 January, 2019;
originally announced January 2019.
-
Designing Anti-Jamming Receivers for NR-DCSK Systems Utilizing ICA, WPD, and VMD Methods
Authors:
Binh Van Nguyen,
Minh Tuan Nguyen,
Hyoyoung Jung,
Kiseon Kim
Abstract:
In this work, we consider an advanced noise reduction differential chaotic shift keying (NR-DCSK) system in which a single antenna source communicates with a single antenna destination under the attack of a single antenna jammer. We devote our efforts to design a novel anti-jamming (AJ) receiver for the considered system. Particularly, we propose a variational mode decomposition-independent compon…
▽ More
In this work, we consider an advanced noise reduction differential chaotic shift keying (NR-DCSK) system in which a single antenna source communicates with a single antenna destination under the attack of a single antenna jammer. We devote our efforts to design a novel anti-jamming (AJ) receiver for the considered system. Particularly, we propose a variational mode decomposition-independent component analysis-wavelet packet decomposition-based (VMD-ICA-WPD-based) structure, in which the VMD method is firstly exploited to generate multiple signals from the single received one. Secondly, the ICA method is applied to coarsely separate chaotic and jamming signals. After that, the WPD method is used to finely estimate and mitigate jamming signals that exist on all outputs of the ICA method. Finally, an inverse ICA procedure is carried out, followed by a summation, and the outcome is passed through the conventional correlation-based receiver for recovering the transmitted information. Simulation results show that the proposed receiver provides significant system performance enhancement compared to that given by the conventional correlation-based receiver with WPD, i.e. 8 dB gain at BER =0.03 and Eb/N0 = 20 dB.
△ Less
Submitted 4 October, 2018;
originally announced October 2018.
-
Quantitative Susceptibility Map** using Deep Neural Network: QSMnet
Authors:
Jaeyeon Yoon,
Enhao Gong,
Itthi Chatnuntawech,
Berkin Bilgic,
**gu Lee,
Woo** Jung,
**gyu Ko,
Hosan Jung,
Kawin Setsompop,
Greg Zaharchuk,
Eung Yeop Kim,
John Pauly,
Jongho Lee
Abstract:
Deep neural networks have demonstrated promising potential for the field of medical image reconstruction. In this work, an MRI reconstruction algorithm, which is referred to as quantitative susceptibility map** (QSM), has been developed using a deep neural network in order to perform dipole deconvolution, which restores magnetic susceptibility source from an MRI field map. Previous approaches of…
▽ More
Deep neural networks have demonstrated promising potential for the field of medical image reconstruction. In this work, an MRI reconstruction algorithm, which is referred to as quantitative susceptibility map** (QSM), has been developed using a deep neural network in order to perform dipole deconvolution, which restores magnetic susceptibility source from an MRI field map. Previous approaches of QSM require multiple orientation data (e.g. Calculation of Susceptibility through Multiple Orientation Sampling or COSMOS) or regularization terms (e.g. Truncated K-space Division or TKD; Morphology Enabled Dipole Inversion or MEDI) to solve the ill-conditioned deconvolution problem. Unfortunately, they either require long multiple orientation scans or suffer from artifacts. To overcome these shortcomings, a deep neural network, QSMnet, is constructed to generate a high quality susceptibility map from single orientation data. The network has a modified U-net structure and is trained using gold-standard COSMOS QSM maps. 25 datasets from 5 subjects (5 orientation each) were applied for patch-wise training after doubling the data using augmentation. Two additional datasets of 5 orientation data were used for validation and test (one dataset each). The QSMnet maps of the test dataset were compared with those from TKD and MEDI for image quality and consistency in multiple head orientations. Quantitative and qualitative image quality comparisons demonstrate that the QSMnet results have superior image quality to those of TKD or MEDI and have comparable image quality to those of COSMOS. Additionally, QSMnet maps reveal substantially better consistency across the multiple orientations than those from TKD or MEDI. As a preliminary application, the network was tested for two patients. The QSMnet maps showed similar lesion contrasts with those from MEDI, demonstrating potential for future applications.
△ Less
Submitted 15 June, 2018; v1 submitted 15 March, 2018;
originally announced March 2018.