Search | arXiv e-print repository

BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

Authors: June-Woo Kim, Miika Toikkanen, Yera Choi, Seoung-Eun Moon, Ho-Young Jung

Abstract: Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model u… ▽ More Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata which includes the gender and age of patients, type of recording devices, and recording location on the patient's body. Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%. This result validates the effectiveness of leveraging metadata and respiratory sound samples in enhancing RSC performance. Additionally, we investigate the model performance in the case where metadata is partially unavailable, which may occur in real-world clinical setting. △ Less

Submitted 14 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: Accepted INTERSPEECH 2024

arXiv:2405.02996 [pdf, other]

RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification

Authors: June-Woo Kim, Miika Toikkanen, Sangmin Bae, Minseok Kim, Ho-Young Jung

Abstract: Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrain… ▽ More Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrained speech models for respiratory sound classification. We find that there is a characterization gap between speech and lung sound samples, and to bridge this gap, data augmentation is essential. However, the most widely used augmentation technique for audio and speech, SpecAugment, requires 2-dimensional spectrogram format and cannot be applied to models pretrained on speech waveforms. To address this, we propose RepAugment, an input-agnostic representation-level augmentation technique that outperforms SpecAugment, but is also suitable for respiratory sound classification with waveform pretrained models. Experimental results show that our approach outperforms the SpecAugment, demonstrating a substantial improvement in the accuracy of minority disease classes, reaching up to 7.14%. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: Accepted EMBC 2024

arXiv:2312.09603 [pdf, other]

Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification

Authors: June-Woo Kim, Sangmin Bae, Won-Yang Cho, Byungjo Lee, Ho-Young Jung

Abstract: Despite the remarkable advances in deep learning technology, achieving satisfactory performance in lung sound classification remains a challenge due to the scarcity of available data. Moreover, the respiratory sound samples are collected from a variety of electronic stethoscopes, which could potentially introduce biases into the trained models. When a significant distribution shift occurs within t… ▽ More Despite the remarkable advances in deep learning technology, achieving satisfactory performance in lung sound classification remains a challenge due to the scarcity of available data. Moreover, the respiratory sound samples are collected from a variety of electronic stethoscopes, which could potentially introduce biases into the trained models. When a significant distribution shift occurs within the test dataset or in a practical scenario, it can substantially decrease the performance. To tackle this issue, we introduce cross-domain adaptation techniques, which transfer the knowledge from a source domain to a distinct target domain. In particular, by considering different stethoscope types as individual domains, we propose a novel stethoscope-guided supervised contrastive learning approach. This method can mitigate any domain-related disparities and thus enables the model to distinguish respiratory sounds of the recording variation of the stethoscope. The experimental results on the ICBHI dataset demonstrate that the proposed methods are effective in reducing the domain dependency and achieving the ICBHI Score of 61.71%, which is a significant improvement of 2.16% over the baseline. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: accepted to ICASSP 2024

arXiv:2311.06480 [pdf, other]

Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance

Authors: June-Woo Kim, Chihyeon Yoon, Miika Toikkanen, Sangmin Bae, Ho-Young Jung

Abstract: Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential data like respiratory sounds is less explored. In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder. We also demonstrate a simple yet effective a… ▽ More Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential data like respiratory sounds is less explored. In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder. We also demonstrate a simple yet effective adversarial fine-tuning method to align features between the synthetic and real respiratory sound samples to improve respiratory sound classification performance. Our experimental results on the ICBHI dataset demonstrate that the proposed adversarial fine-tuning is effective, while only using the conventional augmentation method shows performance degradation. Moreover, our method outperforms the baseline by 2.24% on the ICBHI Score and improves the accuracy of the minority classes up to 26.58%. For the supplementary material, we provide the code at https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: accepted in NeurIPS 2023 Workshop on Deep Generative Models for Health (DGM4H)

arXiv:2311.05889 [pdf, other]

Semantic Map Guided Synthesis of Wireless Capsule Endoscopy Images using Diffusion Models

Authors: Hae** Lee, Jeongwoo Ju, Jonghyuck Lee, Yeoun Joo Lee, Heechul Jung

Abstract: Wireless capsule endoscopy (WCE) is a non-invasive method for visualizing the gastrointestinal (GI) tract, crucial for diagnosing GI tract diseases. However, interpreting WCE results can be time-consuming and tiring. Existing studies have employed deep neural networks (DNNs) for automatic GI tract lesion detection, but acquiring sufficient training examples, particularly due to privacy concerns, r… ▽ More Wireless capsule endoscopy (WCE) is a non-invasive method for visualizing the gastrointestinal (GI) tract, crucial for diagnosing GI tract diseases. However, interpreting WCE results can be time-consuming and tiring. Existing studies have employed deep neural networks (DNNs) for automatic GI tract lesion detection, but acquiring sufficient training examples, particularly due to privacy concerns, remains a challenge. Public WCE databases lack diversity and quantity. To address this, we propose a novel approach leveraging generative models, specifically the diffusion model (DM), for generating diverse WCE images. Our model incorporates semantic map resulted from visualization scale (VS) engine, enhancing the controllability and diversity of generated images. We evaluate our approach using visual inspection and visual Turing tests, demonstrating its effectiveness in generating realistic and diverse WCE images. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2303.18195 [pdf, other]

Model-Free Reconstruction of Capacity Degradation Trajectory of Lithium-Ion Batteries Using Early Cycle Data

Authors: Seongyoon Kim, Hangsoon Jung, Minho Lee, Yun Young Choi, Jung-Il Choi

Abstract: Early degradation prediction of lithium-ion batteries is crucial for ensuring safety and preventing unexpected failure in manufacturing and diagnostic processes. Long-term capacity trajectory predictions can fail due to cumulative errors and noise. To address this issue, this study proposes a data-centric method that uses early single-cycle data to predict the capacity degradation trajectory of li… ▽ More Early degradation prediction of lithium-ion batteries is crucial for ensuring safety and preventing unexpected failure in manufacturing and diagnostic processes. Long-term capacity trajectory predictions can fail due to cumulative errors and noise. To address this issue, this study proposes a data-centric method that uses early single-cycle data to predict the capacity degradation trajectory of lithium-ion cells. The method involves predicting a few knots at specific retention levels using a deep learning-based model and interpolating them to reconstruct the trajectory. Two approaches are used to identify the retention levels of two to four knots: uniformly dividing the retention up to the end of life and finding optimal locations using Bayesian optimization. The proposed model is validated with experimental data from 169 cells using five-fold cross-validation. The results show that mean absolute percentage errors in trajectory prediction are less than 1.60% for all cases of knots. By predicting only the cycle numbers of at least two knots based on early single-cycle charge and discharge data, the model can directly estimate the overall capacity degradation trajectory. Further experiments suggest using three-cycle input data to achieve robust and efficient predictions, even in the presence of noise. The method is then applied to predict various shapes of capacity degradation patterns using additional experimental data from 82 cells. The study demonstrates that collecting only the cycle information of a few knots during model training and a few early cycle data points for predictions is sufficient for predicting capacity degradation. This can help establish appropriate warranties or replacement cycles in battery manufacturing and diagnosis processes. △ Less

Submitted 31 March, 2023; originally announced March 2023.

arXiv:2207.05176 [pdf, other]

Denoising single images by feature ensemble revisited

Authors: Masud An Nur Islam Fahim, Nazmus Saqib, Shafkat Khan Siam, Ho Yub Jung

Abstract: Image denoising is still a challenging issue in many computer vision sub-domains. Recent studies show that significant improvements are made possible in a supervised setting. However, few challenges, such as spatial fidelity and cartoon-like smoothing remain unresolved or decisively overlooked. Our study proposes a simple yet efficient architecture for the denoising problem that addresses the afor… ▽ More Image denoising is still a challenging issue in many computer vision sub-domains. Recent studies show that significant improvements are made possible in a supervised setting. However, few challenges, such as spatial fidelity and cartoon-like smoothing remain unresolved or decisively overlooked. Our study proposes a simple yet efficient architecture for the denoising problem that addresses the aforementioned issues. The proposed architecture revisits the concept of modular concatenation instead of long and deeper cascaded connections, to recover a cleaner approximation of the given image. We find that different modules can capture versatile representations, and concatenated representation creates a richer subspace for low-level image restoration. The proposed architecture's number of parameters remains smaller than the number for most of the previous networks and still achieves significant improvements over the current state-of-the-art networks. △ Less

Submitted 11 July, 2022; originally announced July 2022.

arXiv:2205.06695 [pdf, ps, other]

doi 10.1109/TVT.2023.3254541

STAR-RIS-Assisted Hybrid NOMA mmWave Communication: Optimization and Performance Analysis

Authors: Muhammad Faraz Ul Abrar, Muhammad Talha, Rafay Iqbal Ansari, Syed Ali Hassan, Haejoon Jung

Abstract: Simultaneously reflecting and transmitting reconfigurable intelligent surfaces (STAR-RIS) has recently emerged as prominent technology that exploits the transmissive property of RIS to mitigate the half-space coverage limitation of conventional RIS operating on millimeter-wave (mmWave). In this paper, we study a downlink STAR-RIS-based multi-user multiple-input single-output (MU-MISO) mmWave hybri… ▽ More Simultaneously reflecting and transmitting reconfigurable intelligent surfaces (STAR-RIS) has recently emerged as prominent technology that exploits the transmissive property of RIS to mitigate the half-space coverage limitation of conventional RIS operating on millimeter-wave (mmWave). In this paper, we study a downlink STAR-RIS-based multi-user multiple-input single-output (MU-MISO) mmWave hybrid non-orthogonal multiple access (H-NOMA) wireless network, where a sum-rate maximization problem has been formulated. The design of active and passive beamforming vectors, time and power allocation for H-NOMA is a highly coupled non-convex problem. To handle the problem, we propose an optimization framework based on alternating optimization (AO) that iteratively solves active and passive beamforming sub-problems. Channel correlations and channel strength-based techniques have been proposed for a specific case of two-user optimal clustering and decoding order assignment, respectively, for which analytical solutions to joint power and time allocation for H-NOMA have also been derived. Simulation results show that: 1) the proposed framework leveraging H-NOMA outperforms conventional OMA and NOMA to maximize the achievable sum-rate; 2) using the proposed framework, the supported number of clusters for the given design constraints can be increased considerably; 3) through STAR-RIS, the number of elements can be significantly reduced as compared to conventional RIS to ensure a similar quality-of-service (QoS). △ Less

Submitted 13 May, 2022; originally announced May 2022.

Journal ref: IEEE Transactions on Vehicular Technology ( Volume: 72, Issue: 8, August 2023)

arXiv:2110.03165 [pdf, other]

Offline RL With Resource Constrained Online Deployment

Authors: Jayanth Reddy Regatti, Aniket Anand Deshmukh, Frank Cheng, Young Hun Jung, Abhishek Gupta, Urun Dogan

Abstract: Offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible. As a natural consequence of these harsh conditions, an agent may lack the resources to fully observe the online environment before taking an action. We dub this situation the resource-constrained setting. This leads to situations where the offline dataset (ava… ▽ More Offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible. As a natural consequence of these harsh conditions, an agent may lack the resources to fully observe the online environment before taking an action. We dub this situation the resource-constrained setting. This leads to situations where the offline dataset (available for training) can contain fully processed features (using powerful language models, image models, complex sensors, etc.) which are not available when actions are actually taken online. This disconnect leads to an interesting and unexplored problem in offline RL: Is it possible to use a richly processed offline dataset to train a policy which has access to fewer features in the online environment? In this work, we introduce and formalize this novel resource-constrained problem setting. We highlight the performance gap between policies trained using the full offline dataset and policies trained using limited features. We address this performance gap with a policy transfer algorithm which first trains a teacher agent using the offline dataset where features are fully available, and then transfers this knowledge to a student agent that only uses the resource-constrained features. To better capture the challenge of this setting, we propose a data collection procedure: Resource Constrained-Datasets for RL (RC-D4RL). We evaluate our transfer algorithm on RC-D4RL and the popular D4RL benchmarks and observe consistent improvement over the baseline (TD3+BC without transfer). The code for the experiments is available at https://github.com/JayanthRR/RC-OfflineRL. △ Less

Submitted 7 December, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: Added experiments on discrete control and real world datasets along with more analyses on continuous control tasks

arXiv:2108.06890 [pdf, other]

GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints

Authors: Ji-Hoon Kim, Sang-Hoon Lee, Ji-Hyun Lee, Hong-Gyu Jung, Seong-Whan Lee

Abstract: Few-shot speaker adaptation is a specific Text-to-Speech (TTS) system that aims to reproduce a novel speaker's voice with a few training data. While numerous attempts have been made to the few-shot speaker adaptation system, there is still a gap in terms of speaker similarity to the target speaker depending on the amount of data. To bridge the gap, we propose GC-TTS which achieves high-quality spe… ▽ More Few-shot speaker adaptation is a specific Text-to-Speech (TTS) system that aims to reproduce a novel speaker's voice with a few training data. While numerous attempts have been made to the few-shot speaker adaptation system, there is still a gap in terms of speaker similarity to the target speaker depending on the amount of data. To bridge the gap, we propose GC-TTS which achieves high-quality speaker adaptation with significantly improved speaker similarity. Specifically, we leverage two geometric constraints to learn discriminative speaker representations. Here, a TTS model is pre-trained for base speakers with a sufficient amount of data, and then fine-tuned for novel speakers on a few minutes of data with two geometric constraints. Two geometric constraints enable the model to extract discriminative speaker embeddings from limited data, which leads to the synthesis of intelligible speech. We discuss and verify the effectiveness of GC-TTS by comparing it with popular and essential methods. The experimental results demonstrate that GC-TTS generates high-quality speech from only a few minutes of training data, outperforming standard techniques in terms of speaker similarity to the target speaker. △ Less

Submitted 16 August, 2021; originally announced August 2021.

Comments: Accepted paper in IEEE International Conference on Systems, Man, and Cybernetics (SMC 2021)

arXiv:2105.00286 [pdf, other]

Backhaul-Aware Intelligent Positioning of UAVs and Association of Terrestrial Base Stations for Fronthaul Connectivity

Authors: Muhammad K. Shehzad, Arsalan Ahmad, Syed Ali Hassan, Haejoon Jung

Abstract: The mushroom growth of cellular users requires novel advancements in the existing cellular infrastructure. One way to handle such a tremendous increase is to densely deploy terrestrial small-cell base stations (TSBSs) with careful management of smart backhaul/fronthaul networks. Nevertheless, terrestrial backhaul hubs significantly suffer from the dense fading environment and are difficult to inst… ▽ More The mushroom growth of cellular users requires novel advancements in the existing cellular infrastructure. One way to handle such a tremendous increase is to densely deploy terrestrial small-cell base stations (TSBSs) with careful management of smart backhaul/fronthaul networks. Nevertheless, terrestrial backhaul hubs significantly suffer from the dense fading environment and are difficult to install in a typical urban environment. Therefore, this paper considers the idea of replacing terrestrial backhaul network with an aerial network consisting of unmanned aerial vehicles (UAVs) to provide the fronthaul connectivity between the TSBSs and the ground core-network (GCN). To this end, we focus on the joint positioning of UAVs and the association of TSBSs such that the sum-rate of the overall system is maximized. In particular, the association problem of TSBSs with UAVs is formulated under communication-related constraints, i.e., bandwidth, number of connections to a UAV, power limit, interference threshold, UAV heights, and backhaul data rate. To meet this joint objective, we take advantage of the genetic algorithm (GA) due to the offline nature of our optimization problem. The performance of the proposed approach is evaluated using the unsupervised learning-based k-means clustering algorithm. We observe that the proposed approach is highly effective to satisfy the requirements of smart fronthaul networks. △ Less

Submitted 1 May, 2021; originally announced May 2021.

Comments: 14 pages, 9 figures, 2 tables, IEEE Transactions on Network Science and Engineering, 2021

arXiv:2104.05752 [pdf, other]

Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs

Authors: Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Kuo, Samuel Thomas, Edmilson Morais

Abstract: A major focus of recent research in spoken language understanding (SLU) has been on the end-to-end approach where a single model can predict intents directly from speech inputs without intermediate transcripts. However, this approach presents some challenges. First, since speech can be considered as personally identifiable information, in some cases only automatic speech recognition (ASR) transcri… ▽ More A major focus of recent research in spoken language understanding (SLU) has been on the end-to-end approach where a single model can predict intents directly from speech inputs without intermediate transcripts. However, this approach presents some challenges. First, since speech can be considered as personally identifiable information, in some cases only automatic speech recognition (ASR) transcripts are accessible. Second, intent-labeled speech data is scarce. To address the first challenge, we propose a novel system that can predict intents from flexible types of inputs: speech, ASR transcripts, or both. We demonstrate strong performance for either modality separately, and when both speech and ASR transcripts are available, through system combination, we achieve better results than using a single input modality. To address the second challenge, we leverage a semantically robust pre-trained BERT model and adopt a cross-modal system that co-trains text embeddings and acoustic embeddings in a shared latent space. We further enhance this system by utilizing an acoustic module pre-trained on LibriSpeech and domain-adapting the text module on our target datasets. Our experiments show significant advantages for these pre-training and fine-tuning strategies, resulting in a system that achieves competitive intent-classification performance on Snips SLU and Fluent Speech Commands datasets. △ Less

Submitted 14 June, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: Accepted to Interspeech 2021

arXiv:2011.09026 [pdf, other]

A Vision of XR-aided Teleoperation System Towards 5G/B5G

Authors: Fenghe Hu, Yansha Deng, Hui Zhou, Tae Hun Jung, Chan-Byoung Chae, A. Hamid Aghvami

Abstract: Extended Reality (XR)-aided teleoperation has shown its potential in improving operating efficiency in mission-critical, rich-information and complex scenarios. The multi-sensory XR devices introduce several new types of traffic with unique quality-of-service (QoS) requirements, which are usually defined by three measures---human perception, corresponding sensors, and present devices. To fulfil th… ▽ More Extended Reality (XR)-aided teleoperation has shown its potential in improving operating efficiency in mission-critical, rich-information and complex scenarios. The multi-sensory XR devices introduce several new types of traffic with unique quality-of-service (QoS) requirements, which are usually defined by three measures---human perception, corresponding sensors, and present devices. To fulfil these requirements, cellular-supported wireless connectivity can be a promising solution that can largely benefit the Robot-to-XR and the XR-to-Robot links. In this article, we present industrial and piloting use cases and identify the service bottleneck of each case. We then cover the QoS of Robot-XR and XR-Robot links by summarizing the sensors' parameters and processing procedures. To realise these use cases, we introduce potential solutions for each case with cellular connections. Finally, we build testbeds to investigate the effectiveness of supporting our proposed links using current wireless topologies. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: 7 pages, 2 figures

arXiv:2010.07993 [pdf, ps, other]

The Shift to 6G Communications: Vision and Requirements

Authors: Muhammad Waseem Akhtar, Syed Ali Hassan, Rizwan Ghaffar, Haejoon Jung, Sahil Garg, M. Shamim Hossain

Abstract: The sixth-generation (6G) wireless communication network is expected to integrate the terrestrial, aerial, and maritime communications into a robust network which would be more reliable, fast, and can support a massive number of devices with ultra-low latency requirements. The researchers around the globe are proposing cutting edge technologies such as artificial intelligence (AI)/machine learning… ▽ More The sixth-generation (6G) wireless communication network is expected to integrate the terrestrial, aerial, and maritime communications into a robust network which would be more reliable, fast, and can support a massive number of devices with ultra-low latency requirements. The researchers around the globe are proposing cutting edge technologies such as artificial intelligence (AI)/machine learning (ML), quantum communication/quantum machine learning (QML), blockchain, tera-Hertz and millimeter waves communication, tactile Internet, non-orthogonal multiple access (NOMA), small cells communication, fog/edge computing, etc., as the key technologies in the realization of beyond 5G (B5G) and 6G communications. In this article, we provide a detailed overview of the 6G network dimensions with air interface and associated potential technologies. More specifically, we highlight the use cases and applications of the proposed 6G networks in various dimensions. Furthermore, we also discuss the key performance indicators (KPI) for the B5G/6G network, challenges, and future research opportunities in this domain. △ Less

Submitted 15 October, 2020; originally announced October 2020.

arXiv:2008.09352 [pdf, other]

Deep Learning Methods for Lung Cancer Segmentation in Whole-slide Histopathology Images -- the ACDC@LungHP Challenge 2019

Authors: Zhang Li, Jiehua Zhang, Tao Tan, Xichao Teng, Xiaoliang Sun, Yang Li, Lihong Liu, Yang Xiao, Byungjae Lee, Yilong Li, Qianni Zhang, Shujiao Sun, Yushan Zheng, Junyu Yan, Ni Li, Yiyu Hong, Junsu Ko, Hyun Jung, Yanling Liu, Yu-cheng Chen, Ching-wei Wang, Vladimir Yurovskiy, Pavel Maevskikh, Vahid Khanagha, Yi Jiang , et al. (8 additional authors not shown)

Abstract: Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection)… ▽ More Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection) of cancer tissue in whole slide imaging (WSI), using an annotated dataset of 150 training images and 50 test images from 200 patients. This paper reviews this challenge and summarizes the top 10 submitted methods for lung cancer segmentation. All methods were evaluated using the false positive rate, false negative rate, and DICE coefficient (DC). The DC ranged from 0.7354$\pm$0.1149 to 0.8372$\pm$0.0858. The DC of the best method was close to the inter-observer agreement (0.8398$\pm$0.0890). All methods were based on deep learning and categorized into two groups: multi-model method and single model method. In general, multi-model methods were significantly better ($\textit{p}$<$0.01$) than single model methods, with mean DC of 0.7966 and 0.7544, respectively. Deep learning based methods could potentially help pathologists find suspicious regions for further analysis of lung cancer in WSI. △ Less

Submitted 21 August, 2020; originally announced August 2020.

arXiv:2007.14098 [pdf, other]

Energy Efficiency and Hover Time Optimization in UAV-based HetNets

Authors: S. T. Muntaha, S. A. Hassan, H. Jung, M. S. Hossain

Abstract: In this paper, we investigate the downlink performance of a three-tier heterogeneous network (HetNet). The objective is to enhance the edge capacity of a macro cell by deploying unmanned aerial vehicles (UAVs) as flying base stations and small cells (SCs) for improving the capacity of indoor users in scenarios such as temporary hotspot regions or during disaster situations where the terrestrial ne… ▽ More In this paper, we investigate the downlink performance of a three-tier heterogeneous network (HetNet). The objective is to enhance the edge capacity of a macro cell by deploying unmanned aerial vehicles (UAVs) as flying base stations and small cells (SCs) for improving the capacity of indoor users in scenarios such as temporary hotspot regions or during disaster situations where the terrestrial network is either insufficient or out of service. UAVs are energy-constrained devices with a limited flight time, therefore, we formulate a two layer optimization scheme, where we first optimize the power consumption of each tier for enhancing the system energy efficiency (EE) under a minimum quality-of-service (QoS) requirement, which is followed by optimizing the average hover time of UAVs. We obtain the solution to these nonlinear constrained optimization problems by first utilizing the Lagrange multipliers method and then implementing a sub-gradient approach for obtaining convergence. The results show that through optimal power allocation, the system EE improves significantly in comparison to when maximum power is allocated to users (ground cellular users or connected vehicles). The hover time optimization results in increased flight time of UAVs thus providing service for longer durations. △ Less

Submitted 29 July, 2020; v1 submitted 28 July, 2020; originally announced July 2020.

Comments: 9 pages, 7 figures

arXiv:2007.12912 [pdf, other]

A Drone-Aided Blockchain-Based Smart Vehicular Network

Authors: Muhammad Asaad Cheema, Muhammad Karam Shehzad, Hassaan Khaliq Qureshi, Syed Ali Hassan, Haejoon Jung

Abstract: The staggering growth of the number of vehicles worldwide has become a critical challenge resulting in tragic incidents, environment pollution, congestion, etc. Therefore, one of the promising approaches is to design a smart vehicular system as it is beneficial to drive safely. Present vehicular system lacks data reliability, security, and easy deployment. Motivated by these issues, this paper add… ▽ More The staggering growth of the number of vehicles worldwide has become a critical challenge resulting in tragic incidents, environment pollution, congestion, etc. Therefore, one of the promising approaches is to design a smart vehicular system as it is beneficial to drive safely. Present vehicular system lacks data reliability, security, and easy deployment. Motivated by these issues, this paper addresses a drone-enabled intelligent vehicular system, which is secure, easy to deploy and reliable in quality. Nevertheless, an increase in the number of operating drones in the communication networks makes them more vulnerable towards the cyber-attacks, which can completely sabotage the communication infrastructure. To tackle these problems, we propose a blockchain-based registration and authentication system for the entities such as drones, smart vehicles (SVs) and roadside units (RSUs). This paper is mainly focused on the blockchain-based secure system design and the optimal placement of drones to improve the spectral efficiency of the overall network. In particular, we investigate the association of RSUs with the drones by considering multiple communication-related factors such as available bandwidth, maximum number of links a drone can support, and backhaul limitations. We show that the proposed model can easily be overlaid on the current vehicular network rea** benefits of secure and reliable communications. △ Less

Submitted 25 July, 2020; originally announced July 2020.

arXiv:2007.03497 [pdf, other]

STBC-Aided Cooperative NOMA with Timing Offsets, Imperfect Successive Interference Cancellation, and Imperfect Channel State Information

Authors: Muhammad Waseem Akhtar, Syed Ali Hassan, Sajid Saleem, Haejoon Jung

Abstract: The combination of non-orthogonal multiple access(NOMA) and cooperative communications can be a suitable solution for fifth generation (5G) and beyond 5G (B5G) wireless systems with massive connectivity, because it can provide higher spectral efficiency, lower energy consumption, and improved fairness compared to the non-cooperative NOMA. However,the receiver complexity in the conventional coopera… ▽ More The combination of non-orthogonal multiple access(NOMA) and cooperative communications can be a suitable solution for fifth generation (5G) and beyond 5G (B5G) wireless systems with massive connectivity, because it can provide higher spectral efficiency, lower energy consumption, and improved fairness compared to the non-cooperative NOMA. However,the receiver complexity in the conventional cooperative NOMA increases with increasing number of users owing to successive interference cancellation (SIC) at each user. Space time block code-aided cooperative NOMA (STBC-CNOMA) offers less numbers of SIC as compared to that of conventional cooperative NOMA. In this paper, we evaluate the performance of STBC-CNOMA under practical challenges such as imperfect SIC, imperfect timing synchronization between distributed cooperating users, and imperfect channel state information (CSI). We derive closed-form expressions of the received signals in the presence of such realistic impairments and then use them to evaluate outage probability. Further, we provide intuitive insights into the impact of each impairment on the outage performance through asymptotic analysis at high transmit signal-to-noise ratio. We also compare the complexity of STBC-CNOMA with existing cooperative NOMA protocols for a given number of users. In addition, through analysis and simulation, we observe that the impact of the imperfect SIC on the outage performance of STBC-CNOMA is more significant compared to the other two imperfections. Therefore, considering the smaller number of SIC in STBC-CNOMA compared to the other cooperative NOMA protocols, STBC-CNOMA is an effective solution to achieve high reliability for the same SIC imperfection condition. △ Less

Submitted 7 July, 2020; originally announced July 2020.

arXiv:2006.12160 [pdf, other]

Performance Analysis of Backscatter Communication Systems with Non-orthogonal Multiple Access in Nakagami Fading Channels

Authors: Ahsan Waleed Nazar, Syed Ali Hassan, Haejoon Jung, Aamir Mahmood, Mikael Gidlund

Abstract: Backscatter communication (BackCom) has been emerging as a prospective candidate in tackling lifetime management problems for massively deployed Internet-of-Things devices, which suffer from battery-related issues, i.e., replacements, charging, and recycling. This passive sensing approach allows a backscatter sensor node (BSN) to transmit information by reflecting the incident signal from a carrie… ▽ More Backscatter communication (BackCom) has been emerging as a prospective candidate in tackling lifetime management problems for massively deployed Internet-of-Things devices, which suffer from battery-related issues, i.e., replacements, charging, and recycling. This passive sensing approach allows a backscatter sensor node (BSN) to transmit information by reflecting the incident signal from a carrier emitter without initiating its transmission. To multiplex multiple BSNs, power-domain non-orthogonal multiple access (NOMA), which is a prime candidate for multiple access in beyond 5G systems, is fully exploited in this work. Recently, considerable attention has been devoted to the NOMA-aided BackCom networks in the context of outage probabilities and system throughput. However, the closed-form expressions of bit error rate (BER) for such a system have not been studied. In this paper, we present the design and analysis of a NOMA enhanced bistatic BackCom system for a battery-less smart communication paradigm. Specifically, we derive the closed-form BER expressions for a cluster of two devices in a bistatic BackCom system employing NOMA with imperfect successive interference cancellation under Nakagami-$m$ fading channel. The obtained expressions are utilized to evaluate the reflection coefficients of devices needed for the most favorable system performance. Our results also show that NOMA-BackCom achieves better data throughput compared to the orthogonal multiple access-time domain multiple access schemes (OMA-TDMA). △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: 25 pages, 10 figures

arXiv:2004.12545 [pdf, other]

doi 10.1109/WCNCW48565.2020.9124746

Wireless VR/Haptic Open Platform for Multimodal Teleoperation

Authors: Tae Hun Jung, Hanju Yoo, Yuna **, Chae Eun Rhee, Chan-Byoung Chae

Abstract: With emerging trends in the fifth generation and robotics, the Internet of Skills will enable us to deliver skills or expertise anywhere over the Internet. In this paper, we propose a wireless connected virtual reality and haptic communication open platform to show the proof of concept for multimodal teleoperation systems in real-time. We focus on a practical implementation with commercial product… ▽ More With emerging trends in the fifth generation and robotics, the Internet of Skills will enable us to deliver skills or expertise anywhere over the Internet. In this paper, we propose a wireless connected virtual reality and haptic communication open platform to show the proof of concept for multimodal teleoperation systems in real-time. We focus on a practical implementation with commercial products to facilitate the access and modification of the system. The performance of the system is measured in terms of system latency and user-centric metrics. △ Less

Submitted 26 April, 2020; originally announced April 2020.

arXiv:2002.03808 [pdf, other]

doi 10.1109/IJCNN48605.2020.9207653

Vocoder-free End-to-End Voice Conversion with Transformer Network

Authors: June-Woo Kim, Ho-Young Jung, Minho Lee

Abstract: Mel-frequency filter bank (MFB) based approaches have the advantage of learning speech compared to raw spectrum since MFB has less feature size. However, speech generator with MFB approaches require additional vocoder that needs a huge amount of computation expense for training process. The additional pre/post processing such as MFB and vocoder is not essential to convert real human speech to othe… ▽ More Mel-frequency filter bank (MFB) based approaches have the advantage of learning speech compared to raw spectrum since MFB has less feature size. However, speech generator with MFB approaches require additional vocoder that needs a huge amount of computation expense for training process. The additional pre/post processing such as MFB and vocoder is not essential to convert real human speech to others. It is possible to only use the raw spectrum along with the phase to generate different style of voices with clear pronunciation. In this regard, we propose a fast and effective approach to convert realistic voices using raw spectrum in a parallel manner. Our transformer-based model architecture which does not have any CNN or RNN layers has shown the advantage of learning fast and solved the limitation of sequential computation of conventional RNN. In this paper, we introduce a vocoder-free end-to-end voice conversion method using transformer network. The presented conversion model can also be used in speaker adaptation for speech recognition. Our approach can convert the source voice to a target voice without using MFB and vocoder. We can get an adapted MFB for speech recognition by multiplying the converted magnitude with phase. We perform our voice conversion experiments on TIDIGITS dataset using the metrics such as naturalness, similarity, and clarity with mean opinion score, respectively. △ Less

Submitted 5 February, 2020; originally announced February 2020.

Comments: Work in progress

Journal ref: 2020 International Joint Conference on Neural Networks (IJCNN)

arXiv:1911.10462 [pdf, other]

Design of Anti-Jamming Waveforms for Time-Hop** Spread Spectrum Systems in Tone Jamming Environments

Authors: Hyoyoung Jung, Binh Van Nguyen, Iickho Song, Kiseon Kim

Abstract: We consider the problem of designing waveforms for mitigating single tone jamming (STJ) signals with an estimated jamming frequency in time-hop** spread spectrum (TH SS) systems. The proposed design of waveforms optimizes the anti-jamming (AJ) performance of TH SS systems by minimizing the correlation between the template and STJ signals, in which the problem of waveform optimization is simplifi… ▽ More We consider the problem of designing waveforms for mitigating single tone jamming (STJ) signals with an estimated jamming frequency in time-hop** spread spectrum (TH SS) systems. The proposed design of waveforms optimizes the anti-jamming (AJ) performance of TH SS systems by minimizing the correlation between the template and STJ signals, in which the problem of waveform optimization is simplified by employing a finite number of rectangular pulses. The simplification eventually makes the design of waveforms be converted into a problem of finding eigenvalues and eigenvectors of a matrix. Simulation results show that the waveforms designed by the proposed scheme provide us with performance superior not only to the conventional waveforms but also to the clipper receiver in the mitigation of STJ. The waveforms from the proposed design also exhibit a desirable AJ capability even when the estimated frequency of the STJ is not perfect. △ Less

Submitted 24 November, 2019; originally announced November 2019.

arXiv:1901.07375 [pdf]

Extension of Convolutional Neural Network with General Image Processing Kernels

Authors: Jay Hoon Jung, Yousun Shin, YoungMin Kwon

Abstract: We applied pre-defined kernels also known as filters or masks developed for image processing to convolution neural network. Instead of letting neural networks find its own kernels, we used 41 different general-purpose kernels of blurring, edge detecting, sharpening, discrete cosine transformation, etc. for the first layer of the convolution neural networks. This architecture, thus named as general… ▽ More We applied pre-defined kernels also known as filters or masks developed for image processing to convolution neural network. Instead of letting neural networks find its own kernels, we used 41 different general-purpose kernels of blurring, edge detecting, sharpening, discrete cosine transformation, etc. for the first layer of the convolution neural networks. This architecture, thus named as general filter convolutional neural network (GFNN), can reduce training time by 30% with a better accuracy compared to the regular convolutional neural network (CNN). GFNN also can be trained to achieve 90% accuracy with only 500 samples. Furthermore, even though these kernels are not specialized for the MNIST dataset, we achieved 99.56% accuracy without ensemble nor any other special algorithms. △ Less

Submitted 16 January, 2019; originally announced January 2019.

Comments: 4 pages, 6 figures

Journal ref: TENCON 2018

arXiv:1810.02085 [pdf, ps, other]

Designing Anti-Jamming Receivers for NR-DCSK Systems Utilizing ICA, WPD, and VMD Methods

Authors: Binh Van Nguyen, Minh Tuan Nguyen, Hyoyoung Jung, Kiseon Kim

Abstract: In this work, we consider an advanced noise reduction differential chaotic shift keying (NR-DCSK) system in which a single antenna source communicates with a single antenna destination under the attack of a single antenna jammer. We devote our efforts to design a novel anti-jamming (AJ) receiver for the considered system. Particularly, we propose a variational mode decomposition-independent compon… ▽ More In this work, we consider an advanced noise reduction differential chaotic shift keying (NR-DCSK) system in which a single antenna source communicates with a single antenna destination under the attack of a single antenna jammer. We devote our efforts to design a novel anti-jamming (AJ) receiver for the considered system. Particularly, we propose a variational mode decomposition-independent component analysis-wavelet packet decomposition-based (VMD-ICA-WPD-based) structure, in which the VMD method is firstly exploited to generate multiple signals from the single received one. Secondly, the ICA method is applied to coarsely separate chaotic and jamming signals. After that, the WPD method is used to finely estimate and mitigate jamming signals that exist on all outputs of the ICA method. Finally, an inverse ICA procedure is carried out, followed by a summation, and the outcome is passed through the conventional correlation-based receiver for recovering the transmitted information. Simulation results show that the proposed receiver provides significant system performance enhancement compared to that given by the conventional correlation-based receiver with WPD, i.e. 8 dB gain at BER =0.03 and Eb/N0 = 20 dB. △ Less

Submitted 4 October, 2018; originally announced October 2018.

Comments: 5 pages, 5 figures

arXiv:1803.05627 [pdf]

doi 10.1016/j.neuroimage.2018.06.030.

Quantitative Susceptibility Map** using Deep Neural Network: QSMnet

Authors: Jaeyeon Yoon, Enhao Gong, Itthi Chatnuntawech, Berkin Bilgic, **gu Lee, Woo** Jung, **gyu Ko, Hosan Jung, Kawin Setsompop, Greg Zaharchuk, Eung Yeop Kim, John Pauly, Jongho Lee

Abstract: Deep neural networks have demonstrated promising potential for the field of medical image reconstruction. In this work, an MRI reconstruction algorithm, which is referred to as quantitative susceptibility map** (QSM), has been developed using a deep neural network in order to perform dipole deconvolution, which restores magnetic susceptibility source from an MRI field map. Previous approaches of… ▽ More Deep neural networks have demonstrated promising potential for the field of medical image reconstruction. In this work, an MRI reconstruction algorithm, which is referred to as quantitative susceptibility map** (QSM), has been developed using a deep neural network in order to perform dipole deconvolution, which restores magnetic susceptibility source from an MRI field map. Previous approaches of QSM require multiple orientation data (e.g. Calculation of Susceptibility through Multiple Orientation Sampling or COSMOS) or regularization terms (e.g. Truncated K-space Division or TKD; Morphology Enabled Dipole Inversion or MEDI) to solve the ill-conditioned deconvolution problem. Unfortunately, they either require long multiple orientation scans or suffer from artifacts. To overcome these shortcomings, a deep neural network, QSMnet, is constructed to generate a high quality susceptibility map from single orientation data. The network has a modified U-net structure and is trained using gold-standard COSMOS QSM maps. 25 datasets from 5 subjects (5 orientation each) were applied for patch-wise training after doubling the data using augmentation. Two additional datasets of 5 orientation data were used for validation and test (one dataset each). The QSMnet maps of the test dataset were compared with those from TKD and MEDI for image quality and consistency in multiple head orientations. Quantitative and qualitative image quality comparisons demonstrate that the QSMnet results have superior image quality to those of TKD or MEDI and have comparable image quality to those of COSMOS. Additionally, QSMnet maps reveal substantially better consistency across the multiple orientations than those from TKD or MEDI. As a preliminary application, the network was tested for two patients. The QSMnet maps showed similar lesion contrasts with those from MEDI, demonstrating potential for future applications. △ Less

Submitted 15 June, 2018; v1 submitted 15 March, 2018; originally announced March 2018.

Comments: This work is accepted in neuroimage on 8 June, 2018 and soon will be published. The pubmed link is https://www.ncbi.nlm.nih.gov/pubmed/29894829

Showing 1–25 of 25 results for author: Jung, H