Search | arXiv e-print repository

doi 10.1016/j.compbiomed.2024.108746

Lesion-Aware Cross-Phase Attention Network for Renal Tumor Subtype Classification on Multi-Phase CT Scans

Authors: Kwang-Hyun Uhm, Seung-Won Jung, Sung-Hoo Hong, Sung-Jea Ko

Abstract: Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Altho… ▽ More Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Although deep learning-based approaches have been recently explored for differential diagnosis of kidney cancer, they do not explicitly model the relationships between CT phases in the network design, limiting the diagnostic performance. In this paper, we propose a novel lesion-aware cross-phase attention network (LACPANet) that can effectively capture temporal dependencies of renal lesions across CT phases to accurately classify the lesions into five major pathological subtypes from time-series multi-phase CT images. We introduce a 3D inter-phase lesion-aware attention mechanism to learn effective 3D lesion features that are used to estimate attention weights describing the inter-phase relations of the enhancement patterns. We also present a multi-scale attention scheme to capture and aggregate temporal patterns of lesion features at different spatial scales for further improvement. Extensive experiments on multi-phase CT scans of kidney cancer patients from the collected dataset demonstrate that our LACPANet outperforms state-of-the-art approaches in diagnostic accuracy. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: This article has been accepted for publication in Computers in Biology and Medicine

Journal ref: Computers in Biology and Medicine, 108746, 2024

arXiv:2406.14953 [pdf, other]

Deep Imbalanced Regression to Estimate Vascular Age from PPG Data: a Novel Digital Biomarker for Cardiovascular Health

Authors: Guangkun Nie, Qinghao Zhao, Gongzheng Tang, Jun Li, Shenda Hong

Abstract: Photoplethysmography (PPG) is emerging as a crucial tool for monitoring human hemodynamics, with recent studies highlighting its potential in assessing vascular aging through deep learning. However, real-world age distributions are often imbalanced, posing significant challenges for deep learning models. In this paper, we introduce a novel, simple, and effective loss function named the Dist Loss t… ▽ More Photoplethysmography (PPG) is emerging as a crucial tool for monitoring human hemodynamics, with recent studies highlighting its potential in assessing vascular aging through deep learning. However, real-world age distributions are often imbalanced, posing significant challenges for deep learning models. In this paper, we introduce a novel, simple, and effective loss function named the Dist Loss to address deep imbalanced regression tasks. We trained a one-dimensional convolutional neural network (Net1D) incorporating the Dist Loss on the extensive UK Biobank dataset (n=502,389) to estimate vascular age from PPG signals and validate its efficacy in characterizing cardiovascular health. The model's performance was validated on a 40% held-out test set, achieving state-of-the-art results, especially in regions with small sample sizes. Furthermore, we divided the population into three subgroups based on the difference between predicted vascular age and chronological age: less than -10 years, between -10 and 10 years, and greater than 10 years. We analyzed the relationship between predicted vascular age and several cardiovascular events over a follow-up period of up to 10 years, including death, coronary heart disease, and heart failure. Our results indicate that the predicted vascular age has significant potential to reflect an individual's cardiovascular health status. Our code will be available at https://github.com/Ngk03/AI-vascular-age. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.08815 [pdf, other]

Deep Reinforcement Learning-based Quadcopter Controller: A Practical Approach and Experiments

Authors: Truong-Dong Do, Nguyen Xuan Mung, Sung Kyung Hong

Abstract: Quadcopters have been studied for decades thanks to their maneuverability and capability of operating in a variety of circumstances. However, quadcopters suffer from dynamical nonlinearity, actuator saturation, as well as sensor noise that make it challenging and time consuming to obtain accurate dynamic models and achieve satisfactory control performance. Fortunately, deep reinforcement learning… ▽ More Quadcopters have been studied for decades thanks to their maneuverability and capability of operating in a variety of circumstances. However, quadcopters suffer from dynamical nonlinearity, actuator saturation, as well as sensor noise that make it challenging and time consuming to obtain accurate dynamic models and achieve satisfactory control performance. Fortunately, deep reinforcement learning came and has shown significant potential in system modelling and control of autonomous multirotor aerial vehicles, with recent advancements in deployment, performance enhancement, and generalization. In this paper, an end-to-end deep reinforcement learning-based controller for quadcopters is proposed that is secure for real-world implementation, data-efficient, and free of human gain adjustments. First, a novel actor-critic-based architecture is designed to map the robot states directly to the motor outputs. Then, a quadcopter dynamics-based simulator was devised to facilitate the training of the controller policy. Finally, the trained policy is deployed on a real Crazyflie nano quadrotor platform, without any additional fine-tuning process. Experimental results show that the quadcopter exhibits satisfactory performance as it tracks a given complicated trajectory, which demonstrates the effectiveness and feasibility of the proposed method and signifies its capability in filling the simulation-to-reality gap. △ Less

Submitted 18 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: 6 pages, 5 figures, 3 tables

arXiv:2406.02000 [pdf, other]

Advancing Ultra-Reliable 6G: Transformer and Semantic Localization Empowered Robust Beamforming in Millimeter-Wave Communications

Authors: Avi Deb Raha, Kitae Kim, Apurba Adhikary, Mrityunjoy Gain, Choong Seon Hong

Abstract: Advancements in 6G wireless technology have elevated the importance of beamforming, especially for attaining ultra-high data rates via millimeter-wave (mmWave) frequency deployment. Although promising, mmWave bands require substantial beam training to achieve precise beamforming. While initial deep learning models that use RGB camera images demonstrated promise in reducing beam training overhead,… ▽ More Advancements in 6G wireless technology have elevated the importance of beamforming, especially for attaining ultra-high data rates via millimeter-wave (mmWave) frequency deployment. Although promising, mmWave bands require substantial beam training to achieve precise beamforming. While initial deep learning models that use RGB camera images demonstrated promise in reducing beam training overhead, their performance suffers due to sensitivity to lighting and environmental variations. Due to this sensitivity, Quality of Service (QoS) fluctuates, eventually affecting the stability and dependability of networks in dynamic environments. This emphasizes a critical need for more robust solutions. This paper proposes a robust beamforming technique to ensure consistent QoS under varying environmental conditions. An optimization problem has been formulated to maximize users' data rates. To solve the formulated NP-hard optimization problem, we decompose it into two subproblems: the semantic localization problem and the optimal beam selection problem. To solve the semantic localization problem, we propose a novel method that leverages the k-means clustering and YOLOv8 model. To solve the beam selection problem, we propose a novel lightweight hybrid architecture that utilizes various data sources and a weighted entropy-based mechanism to predict the optimal beams. Rapid and accurate beam predictions are needed to maintain QoS. A novel metric, Accuracy-Complexity Efficiency (ACE), has been proposed to quantify this. Six testing scenarios have been developed to evaluate the robustness of the proposed model. Finally, the simulation result demonstrates that the proposed model outperforms several state-of-the-art baselines regarding beam prediction accuracy, received power, and ACE in the developed test scenarios. △ Less

Submitted 21 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.19771 [pdf, other]

Data Service Maximization in Integrated Terrestrial-Non-Terrestrial 6G Networks: A Deep Reinforcement Learning Approach

Authors: Nway Nway Ei, Kitae Kim, Yan Kyaw Tun, Choong Seon Hong

Abstract: Integrating terrestrial and non-terrestrial networks has emerged as a promising paradigm to fulfill the constantly growing demand for connectivity, low transmission delay, and quality of services (QoS). This integration brings together the strengths of terrestrial and non-terrestrial networks, such as the reliability of terrestrial networks, broad coverage, and service continuity of non-terrestria… ▽ More Integrating terrestrial and non-terrestrial networks has emerged as a promising paradigm to fulfill the constantly growing demand for connectivity, low transmission delay, and quality of services (QoS). This integration brings together the strengths of terrestrial and non-terrestrial networks, such as the reliability of terrestrial networks, broad coverage, and service continuity of non-terrestrial networks like low earth orbit (LEO) satellites. In this work, we study a data service maximization problem in an integrated terrestrial-non-terrestrial network (I-TNT) where the ground base stations (GBSs) and LEO satellites cooperatively serve the coexisting aerial users (AUs) and ground users (GUs). Then, by considering the spectrum scarcity, interference, and QoS requirements of the users, we jointly optimize the user association, AUE's trajectory, and power allocation. To tackle the formulated mixed-integer non-convex problem, we disintegrate it into two subproblems: 1) user association problem and 2) trajectory and power allocation problem. Since the user association problem is a binary integer programming problem, we use the standard convex optimization method to solve it. Meanwhile, the trajectory and power allocation problem is solved by the deep deterministic policy gradient (DDPG) method to cope with the problem's non-convexity and dynamic network environments. Then, the two subproblems are alternately solved by the proposed iterative algorithm. By comparing with the baselines in the existing literature, extensive simulations are conducted to evaluate the performance of the proposed framework. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 5 pages, 4 figures

arXiv:2403.14126 [pdf, other]

Sub-Nyquist Sampling OFDM Radar With a Time-Frequency Phase-Coded Waveform

Authors: Seonghyeon Kang, Kawon Han, Songcheol Hong

Abstract: This paper presents a time-frequency phase-coded sub-Nyquist sampling orthogonal frequency division multiplexing (PC-SNS-OFDM) radar system to reduce the analog-to-digital converter (ADC) sampling rate without any additional hardware or signal processing. The proposed radar divides the transmitted OFDM signal into multiple sub-bands along the frequency axis and provides orthogonality to these sub-… ▽ More This paper presents a time-frequency phase-coded sub-Nyquist sampling orthogonal frequency division multiplexing (PC-SNS-OFDM) radar system to reduce the analog-to-digital converter (ADC) sampling rate without any additional hardware or signal processing. The proposed radar divides the transmitted OFDM signal into multiple sub-bands along the frequency axis and provides orthogonality to these sub-bands by multiplying phase codes in both the time and frequency domains. Although the sampling rate is reduced by the factor of the number of sub-bands, the sub-bands above the sampling rate are folded into the lowest one due to aliasing. In the process of restoring the signals in folded sub-bands to those in full signal bands, the proposed PC-SNS-OFDM radar effectively eliminates symbol-mismatch noise while introducing trade-offs in the range and Doppler ambiguities. The utilization of phase codes in both the frequency and time domains provides flexible control of the range and Doppler ambiguities. It also improves the signal-to-noise ratio (SNR) of detected targets compared to an earlier sub-Nyquist sampling OFDM radar system. This is validated with simulations and experiments under various sub-Nyquist sampling rates. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.11405 [pdf, other]

A Deep Learning Method for Beat-Level Risk Analysis and Interpretation of Atrial Fibrillation Patients during Sinus Rhythm

Authors: Jun Lei, Yuxi Zhou, Xue Tian, Qinghao Zhao, Qi Zhang, Shijia Geng, Qingbo Wu, Shenda Hong

Abstract: Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhyt… ▽ More Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhythm is absent. To address this, this paper proposes a novel artificial intelligence (AI) algorithm to distinguish ``sinus rhythm in AF patients'' and ``sinus rhythm in normal individuals'' in beat-level. We introduce beat-level risk interpreters, trend risk interpreters, addressing the interpretability issues of deep learning models and the difficulty in explaining AF risk trends. Additionally, the beat-level information fusion decision is presented to enhance model accuracy. The experimental results demonstrate that the average AUC for single beats used as testing data from CPSC 2021 dataset is 0.7314. By employing 150 beats for information fusion decision algorithm, the average AUC can reach 0.7591. Compared to previous segment-level algorithms, we utilized beats as input, reducing data dimensionality and making the model more lightweight, facilitating deployment on portable medical devices. Furthermore, we draw new and interesting findings through average beat analysis and subgroup analysis, considering varying risk levels. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.09083 [pdf, other]

Asymptotically Near-Optimal Hybrid Beamforming for mmWave IRS-Aided MIMO Systems

Authors: Jeongjae Lee, Songnam Hong

Abstract: Hybrid beamforming is an emerging technology for massive multiple-input multiple-output (MIMO) systems due to the advantages of lower complexity, cost, and power consumption. Recently, intelligent reflection surface (IRS) has been proposed as the cost-effective technique for robust millimeter-wave (mmWave) MIMO systems. Thus, it is required to jointly optimize a reflection vector and hybrid beamfo… ▽ More Hybrid beamforming is an emerging technology for massive multiple-input multiple-output (MIMO) systems due to the advantages of lower complexity, cost, and power consumption. Recently, intelligent reflection surface (IRS) has been proposed as the cost-effective technique for robust millimeter-wave (mmWave) MIMO systems. Thus, it is required to jointly optimize a reflection vector and hybrid beamforming matrices for IRS-aided mmWave MIMO systems. Due to the lack of RF chain in the IRS, it is unavailable to acquire the TX-IRS and IRS-RX channels separately. Instead, there are efficient methods to estimate the so-called effective (or cascaded) channel in literature. We for the first time derive the near-optimal solution of the aforementioned joint optimization only using the effective channel. Based on our theoretical analysis, we develop the practical reflection vector and hybrid beamforming matrices by projecting the asymptotic solution into the modulus constraint. Via simulations, it is demonstrated that the proposed construction can outperform the state-of-the-art (SOTA) method, where the latter even requires the knowledge of the TX-IRS and IRS-RX channels separately. Furthermore, our construction can provide robustness for channel estimation errors, which is inevitable for practical massive MIMO systems. △ Less

Submitted 24 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: Submitted to IEEE Transactions on Vehicular Technology

arXiv:2403.02028 [pdf, other]

Target Localization and Performance Trade-Offs in Cooperative ISAC Systems: A Scheme Based on 5G NR OFDM Signals

Authors: Zhenkun Zhang, Hong Ren, Cunhua Pan, Sheng Hong, Dongming Wang, Jiangzhou Wang, Xiaohu You

Abstract: The integration of sensing capabilities into communication systems, by sharing physical resources, has a significant potential for reducing spectrum, hardware, and energy costs while inspiring innovative applications. Cooperative networks, in particular, are expected to enhance sensing services by enlarging the coverage area and enriching sensing measurements, thus improving the service availabili… ▽ More The integration of sensing capabilities into communication systems, by sharing physical resources, has a significant potential for reducing spectrum, hardware, and energy costs while inspiring innovative applications. Cooperative networks, in particular, are expected to enhance sensing services by enlarging the coverage area and enriching sensing measurements, thus improving the service availability and accuracy. This paper proposes a cooperative integrated sensing and communication (ISAC) framework by leveraging information-carrying orthogonal frequency division multiplexing (OFDM) signals transmitted by access points (APs). Specifically, we propose a two-stage scheme for target localization, where communication signals are reused as sensing reference signals based on the system information shared at the central processing unit (CPU). In Stage I, we measure the ranges of scattered paths induced by targets, through the extraction of time-delay information from the received signals at APs. Then, the target locations are estimated in Stage II based on these range measurements. Considering that the scattered paths corresponding to some targets may not be detectable by all APs, we propose an effective algorithm to match the range measurements with the targets and achieve the target location estimation. Notably, by analyzing the OFDM numerologies defined in fifth generation (5G) standards, we elucidate the flexibility and consistency of performance trade-offs in both communication and sensing aspects. Finally, numerical results confirm the effectiveness of our sensing scheme and the cooperative gain of the ISAC framework. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2401.12783 [pdf, other]

A Review of Deep Learning Methods for Photoplethysmography Data

Authors: Guangkun Nie, Jiabao Zhu, Gongzheng Tang, Deyun Zhang, Shijia Geng, Qinghao Zhao, Shenda Hong

Abstract: Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this rev… ▽ More Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this review, we systematically reviewed papers that applied deep learning models to process PPG data between January 1st of 2017 and July 31st of 2023 from Google Scholar, PubMed and Dimensions. Each paper is analyzed from three key perspectives: tasks, models, and data. We finally extracted 193 papers where different deep learning frameworks were used to process PPG signals. Based on the tasks addressed in these papers, we categorized them into two major groups: medical-related, and non-medical-related. The medical-related tasks were further divided into seven subgroups, including blood pressure analysis, cardiovascular monitoring and diagnosis, sleep health, mental health, respiratory monitoring and analysis, blood glucose analysis, as well as others. The non-medical-related tasks were divided into four subgroups, which encompass signal processing, biometric identification, electrocardiogram reconstruction, and human activity recognition. In conclusion, significant progress has been made in the field of using deep learning methods to process PPG data recently. This allows for a more thorough exploration and utilization of the information contained in PPG signals. However, challenges remain, such as limited quantity and quality of publicly available databases, a lack of effective validation in real-world scenarios, and concerns about the interpretability, scalability, and complexity of deep learning models. Moreover, there are still emerging research areas that require further investigation. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.11419 [pdf, other]

Joint UAV Deployment and Resource Allocation in THz-Assisted MEC-Enabled Integrated Space-Air-Ground Networks

Authors: Yan Kyaw Tun, György Dán, Yu Min Park, Choong Seon Hong

Abstract: Multi-access edge computing (MEC)-enabled integrated space-air-ground (SAG) networks have drawn much attention recently, as they can provide communication and computing services to wireless devices in areas that lack terrestrial base stations (TBSs). Leveraging the ample bandwidth in the terahertz (THz) spectrum, in this paper, we propose MEC-enabled integrated SAG networks with collaboration amon… ▽ More Multi-access edge computing (MEC)-enabled integrated space-air-ground (SAG) networks have drawn much attention recently, as they can provide communication and computing services to wireless devices in areas that lack terrestrial base stations (TBSs). Leveraging the ample bandwidth in the terahertz (THz) spectrum, in this paper, we propose MEC-enabled integrated SAG networks with collaboration among unmanned aerial vehicles (UAVs). We then formulate the problem of minimizing the energy consumption of devices and UAVs in the proposed MEC-enabled integrated SAG networks by optimizing tasks offloading decisions, THz sub-bands assignment, transmit power control, and UAVs deployment. The formulated problem is a mixed-integer nonlinear programming (MILP) problem with a non-convex structure, which is challenging to solve. We thus propose a block coordinate descent (BCD) approach to decompose the problem into four sub-problems: 1) device task offloading decision problem, 2) THz sub-band assignment and power control problem, 3) UAV deployment problem, and 4) UAV task offloading decision problem. We then propose to use a matching game, concave-convex procedure (CCP) method, successive convex approximation (SCA), and block successive upper-bound minimization (BSUM) approaches for solving the individual subproblems. Finally, extensive simulations are performed to demonstrate the effectiveness of our proposed algorithm. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 36 pages, 8 figures

arXiv:2401.06966 [pdf, other]

Near-Field Channel Estimation for XL-RIS Assisted Multi-User XL-MIMO Systems: Hybrid Beamforming Architectures

Authors: Jeongjae Lee, Hyeong** Chung, Yunseong Cho, Sunwoo Kim, Songnam Hong

Abstract: Channel estimation is one of the key challenges for the deployment of extremely large-scale reconfigurable intelligent surface (XL-RIS) assisted multiple-input multiple-output (MIMO) systems. In this paper, we study the channel estimation problem for XL-RIS assisted multi-user XL-MIMO systems with hybrid beamforming structures. For this system, we propose an {\em unified} channel estimation method… ▽ More Channel estimation is one of the key challenges for the deployment of extremely large-scale reconfigurable intelligent surface (XL-RIS) assisted multiple-input multiple-output (MIMO) systems. In this paper, we study the channel estimation problem for XL-RIS assisted multi-user XL-MIMO systems with hybrid beamforming structures. For this system, we propose an {\em unified} channel estimation method that yields a notable estimation accuracy in the near-field BS-RIS and near-field RIS-User channels (in short, near-near field channels), far-near field channels, and far-far field channels. Our key idea is that the effective (or cascaded) channels to be estimated can be each factorized as the product of low-rank matrices (i.e., the product of the common (or user-independent) matrix and the user-specific coefficient matrix). The common matrix whose columns are the basis of the column space of the BS-RIS channel matrix is efficiently estimated via a {\em collaborative} low-rank approximation (CLRA). Leveraging the hybrid beamforming structures, we develop an efficient iterative algorithm that jointly optimizes the user-specific coefficient matrices. Via experiments and complexity analysis, we verify the effectiveness of the proposed channel estimation method (named CLRA-JO) in the aforementioned three classes of wireless channels. △ Less

Submitted 25 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: submitted to IEEE Transactions on Communications

arXiv:2312.08714 [pdf, other]

Aerial STAR-RIS Empowered MEC: A DRL Approach for Energy Minimization

Authors: Pyae Sone Aung, Loc X. Nguyen, Yan Kyaw Tun, Zhu Han, Choong Seon Hong

Abstract: Multi-access Edge Computing (MEC) addresses computational and battery limitations in devices by allowing them to offload computation tasks. To overcome the difficulties in establishing line-of-sight connections, integrating unmanned aerial vehicles (UAVs) has proven beneficial, offering enhanced data exchange, rapid deployment, and mobility. The utilization of reconfigurable intelligent surfaces (… ▽ More Multi-access Edge Computing (MEC) addresses computational and battery limitations in devices by allowing them to offload computation tasks. To overcome the difficulties in establishing line-of-sight connections, integrating unmanned aerial vehicles (UAVs) has proven beneficial, offering enhanced data exchange, rapid deployment, and mobility. The utilization of reconfigurable intelligent surfaces (RIS), specifically simultaneously transmitting and reflecting RIS (STAR-RIS) technology, further extends coverage capabilities and introduces flexibility in MEC. This study explores the integration of UAV and STAR-RIS to facilitate communication between IoT devices and an MEC server. The formulated problem aims to minimize energy consumption for IoT devices and aerial STAR-RIS by jointly optimizing task offloading, aerial STAR-RIS trajectory, amplitude and phase shift coefficients, and transmit power. Given the non-convexity of the problem and the dynamic environment, solving it directly within a polynomial time frame is challenging. Therefore, deep reinforcement learning (DRL), particularly proximal policy optimization (PPO), is introduced for its sample efficiency and stability. Simulation results illustrate the effectiveness of the proposed system compared to benchmark schemes in the literature. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.05548 [pdf, other]

doi 10.1109/JBHI.2022.3219123

A Unified Multi-Phase CT Synthesis and Classification Framework for Kidney Cancer Diagnosis with Incomplete Data

Authors: Kwang-Hyun Uhm, Seung-Won Jung, Moon Hyung Choi, Sung-Hoo Hong, Sung-Jea Ko

Abstract: Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effectiv… ▽ More Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effective for the diagnosis task. In this paper, we propose a unified framework for kidney cancer diagnosis with incomplete multi-phase CT, which simultaneously recovers missing CT images and classifies cancer subtypes using the completed set of images. The advantage of our framework is that it encourages a synthesis model to explicitly learn to generate missing CT phases that are helpful for classifying cancer subtypes. We further incorporate lesion segmentation network into our framework to exploit lesion-level features for effective cancer classification in the whole CT volumes. The proposed framework is based on fully 3D convolutional neural networks to jointly optimize both synthesis and classification of 3D CT volumes. Extensive experiments on both in-house and external datasets demonstrate the effectiveness of our framework for the diagnosis with incomplete data compared with state-of-the-art baselines. In particular, cancer subtype classification using the completed CT data by our method achieves higher performance than the classification using the given incomplete data. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics

Journal ref: JBHI, 2022

arXiv:2312.05528 [pdf, other]

Exploring 3D U-Net Training Configurations and Post-Processing Strategies for the MICCAI 2023 Kidney and Tumor Segmentation Challenge

Authors: Kwang-Hyun Uhm, Hyunjun Cho, Zhixin Xu, Seohoon Lim, Seung-Won Jung, Sung-Hoo Hong, Sung-Jea Ko

Abstract: In 2023, it is estimated that 81,800 kidney cancer cases will be newly diagnosed, and 14,890 people will die from this cancer in the United States. Preoperative dynamic contrast-enhanced abdominal computed tomography (CT) is often used for detecting lesions. However, there exists inter-observer variability due to subtle differences in the imaging features of kidney and kidney tumors. In this paper… ▽ More In 2023, it is estimated that 81,800 kidney cancer cases will be newly diagnosed, and 14,890 people will die from this cancer in the United States. Preoperative dynamic contrast-enhanced abdominal computed tomography (CT) is often used for detecting lesions. However, there exists inter-observer variability due to subtle differences in the imaging features of kidney and kidney tumors. In this paper, we explore various 3D U-Net training configurations and effective post-processing strategies for accurate segmentation of kidneys, cysts, and kidney tumors in CT images. We validated our model on the dataset of the 2023 Kidney and Kidney Tumor Segmentation (KiTS23) challenge. Our method took second place in the final ranking of the KiTS23 challenge on unseen test data with an average Dice score of 0.820 and an average Surface Dice of 0.712. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: MICCAI 2023, KITS 2023 challenge 2nd place

arXiv:2308.02076 [pdf, other]

doi 10.1109/TRS.2023.3333430

Sub-Nyquist Sampling OFDM Radar

Authors: Kawon Han, SeongHyeon Kang, Songcheol Hong

Abstract: In this paper, we propose a sub-Nyquist sampling (SNS) orthogonal frequency-division multiplexing (OFDM) radar system capable of reducing the analog-to-digital converter (ADC) sampling rate in OFDM radar without any additional manipulations of its hardware and waveform. To this end, the proposed system utilizes the ADC sampling rate of B/L to sample the received baseband signal with a bandwidth of… ▽ More In this paper, we propose a sub-Nyquist sampling (SNS) orthogonal frequency-division multiplexing (OFDM) radar system capable of reducing the analog-to-digital converter (ADC) sampling rate in OFDM radar without any additional manipulations of its hardware and waveform. To this end, the proposed system utilizes the ADC sampling rate of B/L to sample the received baseband signal with a bandwidth of B, where L is a positive proper divisor of the number of subcarriers. This divides the baseband signal into L sub-bands, folding into a sub-Nyquist frequency band due to aliasing. By leveraging known modulation symbols of the transmitted signal, the folded signal can be unfolded to the full-band signal. This allows an estimation of target ranges with the range resolution of the full signal bandwidth B without the degradation of the maximum unambiguous range. During the signal-unfolding process, the signals from other sub-bands remain as symbol-mismatch noise (SMN), which significantly degrades the signal-to-noise ratio (SNR) of the detected targets. It also causes weaker targets to be submerged under the noise in range profiles. To resolve this, a symbol-mismatch noise cancellation (SMNC) technique is also proposed, which reconstructs the interfering signals from the other sub-bands using the detected targets and subtracts them from the unfolded signal. As a result, the proposed sub-Nyquist sampling OFDM radar and corresponding signal processing technique enable a reduction in the ADC sampling rate by the ratio of L while incurring only a 10 log10 L increase in the noise due to noise folding. This is validated through simulations and measurements with various sub-sampling ratios. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: 12 pages, 13 figures

Journal ref: IEEE Transactions on Radar Systems, vol. 1, pp. 669-680, 2023

arXiv:2307.15469 [pdf, other]

SpaceRIS: LEO Satellite Coverage Maximization in 6G Sub-THz Networks by MAPPO DRL and Whale Optimization

Authors: Sheikh Salman Hassan, Yu Min Park, Yan Kyaw Tun, Walid Saad, Zhu Han, Choong Seon Hong

Abstract: Satellite systems face a significant challenge in effectively utilizing limited communication resources to meet the demands of ground network traffic, characterized by asymmetrical spatial distribution and time-varying characteristics. Moreover, the coverage range and signal transmission distance of low Earth orbit (LEO) satellites are restricted by notable propagation attenuation, molecular absor… ▽ More Satellite systems face a significant challenge in effectively utilizing limited communication resources to meet the demands of ground network traffic, characterized by asymmetrical spatial distribution and time-varying characteristics. Moreover, the coverage range and signal transmission distance of low Earth orbit (LEO) satellites are restricted by notable propagation attenuation, molecular absorption, and space losses in sub-terahertz (THz) frequencies. This paper introduces a novel approach to maximize LEO satellite coverage by leveraging reconfigurable intelligent surfaces (RISs) within 6G sub-THz networks. The optimization objectives encompass enhancing the end-to-end data rate, optimizing satellite-remote user equipment (RUE) associations, data packet routing within satellite constellations, RIS phase shift, and ground base station (GBS) transmit power (i.e., active beamforming). The formulated joint optimization problem poses significant challenges owing to its time-varying environment, non-convex characteristics, and NP-hard complexity. To address these challenges, we propose a block coordinate descent (BCD) algorithm that integrates balanced K-means clustering, multi-agent proximal policy optimization (MAPPO) deep reinforcement learning (DRL), and whale optimization (WOA) techniques. The performance of the proposed approach is demonstrated through comprehensive simulation results, exhibiting its superiority over existing baseline methods in the literature. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.10550 [pdf]

SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer

Authors: Daegyeom Kim, Seongho Hong, Yong-Hoon Choi

Abstract: Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice. In this paper, we propose a style control (SC) VALL-E model based on the neural codec language model (called VALL-E), which follows the structure of the generativ… ▽ More Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice. In this paper, we propose a style control (SC) VALL-E model based on the neural codec language model (called VALL-E), which follows the structure of the generative pretrained transformer 3 (GPT-3). The proposed SC VALL-E takes input from text sentences and prompt audio and is designed to generate controllable speech by not simply mimicking the characteristics of the prompt audio but by controlling the attributes to produce diverse voices. We identify tokens in the style embedding matrix of the newly designed style network that represent attributes such as emotion, speaking rate, pitch, and voice intensity, and design a model that can control these attributes. To evaluate the performance of SC VALL-E, we conduct comparative experiments with three representative expressive speech synthesis models: global style token (GST) Tacotron2, variational autoencoder (VAE) Tacotron2, and original VALL-E. We measure word error rate (WER), F0 voiced error (FVE), and F0 gross pitch error (F0GPE) as evaluation metrics to assess the accuracy of generated sentences. For comparing the quality of synthesized speech, we measure comparative mean option score (CMOS) and similarity mean option score (SMOS). To evaluate the style control ability of the generated speech, we observe the changes in F0 and mel-spectrogram by modifying the trained tokens. When using prompt audio that is not present in the training data, SC VALL-E generates a variety of expressive sounds and demonstrates competitive performance compared to the existing models. Our implementation, pretrained models, and audio samples are located on GitHub. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2306.14385 [pdf, other]

Calibration of Wideband LFM Radars based on Sliding Window Algorithm

Authors: Hyung-Woo Kim, **-woo Kim, **-ha Kim, JaeYoung Choi, Sangpyo Hong, Byungkwan Kim

Abstract: This paper addresses the challenges of wideband signal beamforming in radar systems and proposes a new calibration method. Due to operating conditions, the frequency dependent characteristics of the system can be changed, and amplitude, phase, and time delay error can be generated. The proposed method is based on the concept of sliding window algorithm for linear frequency modulated (LFM) signals.… ▽ More This paper addresses the challenges of wideband signal beamforming in radar systems and proposes a new calibration method. Due to operating conditions, the frequency dependent characteristics of the system can be changed, and amplitude, phase, and time delay error can be generated. The proposed method is based on the concept of sliding window algorithm for linear frequency modulated (LFM) signals. To calibrate the frequency-dependent errors from transceiver and the time delay error from true time delay elements, the proposed method utilizes the characteristic of the LFM signal. The LFM signal changes its frequency linearly with time, and the frequency domain characteristics of the hardware are presented in time. Therefore, by applying matched filter to a part of the LFM signal, the frequency dependent characteristics can be monitored and calibrated. The proposed method is compared with the conventional matched filter based calibration results and verified by simulation results and beampatterns. Since the proposed method utilizes LFM signal as calibration tone, the proposed method can be applied to any beamforming systems, not limited to LFM radars. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: 11 pages

arXiv:2304.14838 [pdf, other]

Vision-based Target Pose Estimation with Multiple Markers for the Perching of UAVs

Authors: Truong-Dong Do, Nguyen Xuan-Mung, Sung-Kyung Hong

Abstract: Autonomous Nano Aerial Vehicles have been increasingly popular in surveillance and monitoring operations due to their efficiency and maneuverability. Once a target location has been reached, drones do not have to remain active during the mission. It is possible for the vehicle to perch and stop its motors in such situations to conserve energy, as well as maintain a static position in unfavorable f… ▽ More Autonomous Nano Aerial Vehicles have been increasingly popular in surveillance and monitoring operations due to their efficiency and maneuverability. Once a target location has been reached, drones do not have to remain active during the mission. It is possible for the vehicle to perch and stop its motors in such situations to conserve energy, as well as maintain a static position in unfavorable flying conditions. In the perching target estimation phase, the steady and accuracy of a visual camera with markers is a significant challenge. It is rapidly detectable from afar when using a large marker, but when the drone approaches, it quickly disappears as out of camera view. In this paper, a vision-based target poses estimation method using multiple markers is proposed to deal with the above-mentioned problems. First, a perching target with a small marker inside a larger one is designed to improve detection capability at wide and close ranges. Second, the relative poses of the flying vehicle are calculated from detected markers using a monocular camera. Next, a Kalman filter is applied to provide a more stable and reliable pose estimation, especially when the measurement data is missing due to unexpected reasons. Finally, we introduced an algorithm for merging the poses data from multi markers. The poses are then sent to the position controller to align the drone and the marker's center and steer it to perch on the target. The experimental results demonstrated the effectiveness and feasibility of the adopted approach. The drone can perch successfully onto the center of the markers with the attached 25mm-diameter rounded magnet. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 5 pages, 6 figures, 2 tables

arXiv:2303.07697 [pdf, other]

DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions

Authors: Geumbyeol Hwang, Sunwon Hong, Seunghyun Lee, Sungwoo Park, Gyeongsu Chae

Abstract: For realistic talking head generation, creating natural head motion while maintaining accurate lip synchronization is essential. To fulfill this challenging task, we propose DisCoHead, a novel method to disentangle and control head pose and facial expressions without supervision. DisCoHead uses a single geometric transformation as a bottleneck to isolate and extract head motion from a head-driving… ▽ More For realistic talking head generation, creating natural head motion while maintaining accurate lip synchronization is essential. To fulfill this challenging task, we propose DisCoHead, a novel method to disentangle and control head pose and facial expressions without supervision. DisCoHead uses a single geometric transformation as a bottleneck to isolate and extract head motion from a head-driving video. Either an affine or a thin-plate spline transformation can be used and both work well as geometric bottlenecks. We enhance the efficiency of DisCoHead by integrating a dense motion estimator and the encoder of a generator which are originally separate modules. Taking a step further, we also propose a neural mix approach where dense motion is estimated and applied implicitly by the encoder. After applying the disentangled head motion to a source identity, DisCoHead controls the mouth region according to speech audio, and it blinks eyes and moves eyebrows following a separate driving video of the eye region, via the weight modulation of convolutional neural networks. The experiments using multiple datasets show that DisCoHead successfully generates realistic audio-and-video-driven talking heads and outperforms state-of-the-art methods. Project page: https://deepbrainai-research.github.io/discohead/ △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted to ICASSP 2023

arXiv:2303.02573 [pdf, ps, other]

doi 10.1109/TVT.2023.3251415

Learning Decentralized Power Control in Cell-Free Massive MIMO Networks

Authors: Daesung Yu, Hoon Lee, Seung-Eun Hong, Seok-Hwan Park

Abstract: This paper studies learning-based decentralized power control methods for cell-free massive multiple-input multiple-output (MIMO) systems where a central processor (CP) controls access points (APs) through fronthaul coordination. To determine the transmission policy of distributed APs, it is essential to develop a network-wide collaborative optimization mechanism. To address this challenge, we des… ▽ More This paper studies learning-based decentralized power control methods for cell-free massive multiple-input multiple-output (MIMO) systems where a central processor (CP) controls access points (APs) through fronthaul coordination. To determine the transmission policy of distributed APs, it is essential to develop a network-wide collaborative optimization mechanism. To address this challenge, we design a cooperative learning (CL) framework which manages computation and coordination strategies of the CP and APs using dedicated deep neural network (DNN) modules. To build a versatile learning structure, the proposed CL is carefully designed such that its forward pass calculations are independent of the number of APs. To this end, we adopt a parameter reuse concept which installs an identical DNN module at all APs. Consequently, the proposed CL trained at a particular configuration can be readily applied to arbitrary AP populations. Numerical results validate the advantages of the proposed CL over conventional non-cooperative approaches. △ Less

Submitted 4 March, 2023; originally announced March 2023.

Comments: accepted for publication on IEEE Transactions on Vehicular Technology

arXiv:2302.12533

doi 10.1186/s13104-023-06400-4

HUST bearing: a practical dataset for ball bearing fault diagnosis

Authors: Nguyen Duc Thuan, Hoang Si Hong

Abstract: In this work, we introduce a practical dataset named HUST bearing, that provides a large set of vibration data on different ball bearings. This dataset contains 90 raw vibration data of 6 types of defects (inner crack, outer crack, ball crack, and their 2-combinations) on 5 types of bearing at 3 working conditions with the sample rate of 51,200 samples per second. We established the envelope analy… ▽ More In this work, we introduce a practical dataset named HUST bearing, that provides a large set of vibration data on different ball bearings. This dataset contains 90 raw vibration data of 6 types of defects (inner crack, outer crack, ball crack, and their 2-combinations) on 5 types of bearing at 3 working conditions with the sample rate of 51,200 samples per second. We established the envelope analysis and order tracking analysis on the introduced dataset to allow an initial evaluation of the data. A number of classical machine learning classification methods are used to identify bearing faults of the dataset using features in different domains. The typical advanced unsupervised transfer learning algorithms also perform to observe the transferability of knowledge among parts of the dataset. The experimental results of examined methods on the dataset gain divergent accuracy up to 100% on classification task and 60-80% on unsupervised transfer learning task. △ Less

Submitted 2 October, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

Comments: We are considering some issues in the paper

arXiv:2302.04499 [pdf, other]

RIS-Position and Orientation Estimation in MIMO-OFDM Systems with Practical Scatterers

Authors: Sheng Hong, Minghui Li, Cunhua Pan, Marco Di Renzo, Wei Zhang, Lajos Hanzo

Abstract: In this paper, we investigate the problem of estimating the position and the angle of rotation of a mobile station (MS) in a millimeter wave (mmWave) multiple-input-multiple-output (MIMO) system aided by a reconfigurable intelligent surface (RIS). The virtual line-of-sight (VLoS) link created by the RIS and the non-line-of-sight (NLoS) links that originate from scatterers in the considered environ… ▽ More In this paper, we investigate the problem of estimating the position and the angle of rotation of a mobile station (MS) in a millimeter wave (mmWave) multiple-input-multiple-output (MIMO) system aided by a reconfigurable intelligent surface (RIS). The virtual line-of-sight (VLoS) link created by the RIS and the non-line-of-sight (NLoS) links that originate from scatterers in the considered environment are utilized to facilitate the estimation. A two-step positioning scheme is exploited, where the channel parameters are first acquired, and the position-related parameters are then estimated. The channel parameters are obtained through a coarser and a subsequent finer estimation processes. As for the coarse estimation, the distributed compressed sensing orthogonal simultaneous matching pursuit (DCS-SOMP) algorithm, the maximum likelihood (ML) algorithm, and the discrete Fourier transform (DFT) are utilized to separately estimate the channel parameters. The obtained channel parameters are then jointly refined by using the space-alternating generalized expectation maximization (SAGE) algorithm, which circumvents the high-dimensional optimization issue of ML estimation. Departing from the estimated channel parameters, the positioning-related parameters are estimated. The performance of estimating the channel-related and position-related parameters is theoretically quantified by using the Cramer-Rao lower bound (CRLB). Simulation results demonstrate the superior performance of the proposed positioning algorithms. △ Less

Submitted 23 May, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2211.05910 [pdf, other]

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

arXiv:2211.04470 [pdf, other]

Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report

Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Lukasz Treszczotko, Xin Chang, Piotr Ksiazek, Michal Lopuszynski, Maciej Pioro, Rafal Rudnicki, Maciej Smyl, Yujie Ma, Zhenyu Li, Zehui Chen, Jialei Xu, Xianming Liu, Junjun Jiang, XueChao Shi, Difan Xu, Yanan Li, Xiaotao Wang, Lei Lei, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo , et al. (14 additional authors not shown)

Abstract: Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth es… ▽ More Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth estimation solutions that can show a real-time performance on IoT platforms and smartphones. For this, the participants used a large-scale RGB-to-depth dataset that was collected with the ZED stereo camera capable to generated depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the Raspberry Pi 4 platform, where the developed solutions were able to generate VGA resolution depth maps at up to 27 FPS while achieving high fidelity results. All models developed in the challenge are also compatible with any Android or Linux-based mobile devices, their detailed description is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2105.08630, arXiv:2211.03885; text overlap with arXiv:2105.08819, arXiv:2105.08826, arXiv:2105.08629, arXiv:2105.07809, arXiv:2105.07825

arXiv:2210.15850 [pdf, other]

doi 10.1109/BigComp51126.2021.00039

Federated Learning based Energy Demand Prediction with Clustered Aggregation

Authors: Ye Lin Tun, Kyi Thar, Chu Myaet Thwal, Choong Seon Hong

Abstract: To reduce negative environmental impacts, power stations and energy grids need to optimize the resources required for power production. Thus, predicting the energy consumption of clients is becoming an important part of every energy management system. Energy usage information collected by the clients' smart homes can be used to train a deep neural network to predict the future energy demand. Colle… ▽ More To reduce negative environmental impacts, power stations and energy grids need to optimize the resources required for power production. Thus, predicting the energy consumption of clients is becoming an important part of every energy management system. Energy usage information collected by the clients' smart homes can be used to train a deep neural network to predict the future energy demand. Collecting data from a large number of distributed clients for centralized model training is expensive in terms of communication resources. To take advantage of distributed data in edge systems, centralized training can be replaced by federated learning where each client only needs to upload model updates produced by training on its local data. These model updates are aggregated into a single global model by the server. But since different clients can have different attributes, model updates can have diverse weights and as a result, it can take a long time for the aggregated global model to converge. To speed up the convergence process, we can apply clustering to group clients based on their properties and aggregate model updates from the same cluster together to produce a cluster specific global model. In this paper, we propose a recurrent neural network based energy demand predictor, trained with federated learning on clustered clients to take advantage of distributed data and speed up the convergence process. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: Accepted by BigComp 2021

arXiv:2209.10797 [pdf, other]

DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation

Authors: Seongmin Hong, Seungjae Moon, Junsoo Kim, Sungjae Lee, Minsub Kim, Dongsoo Lee, Joo-Young Kim

Abstract: Transformer is a deep learning language model widely used for natural language processing (NLP) services in datacenters. Among transformer models, Generative Pre-trained Transformer (GPT) has achieved remarkable performance in text generation, or natural language generation (NLG), which needs the processing of a large input context in the summarization stage, followed by the generation stage that… ▽ More Transformer is a deep learning language model widely used for natural language processing (NLP) services in datacenters. Among transformer models, Generative Pre-trained Transformer (GPT) has achieved remarkable performance in text generation, or natural language generation (NLG), which needs the processing of a large input context in the summarization stage, followed by the generation stage that produces a single word at a time. The conventional platforms such as GPU are specialized for the parallel processing of large inputs in the summarization stage, but their performance significantly degrades in the generation stage due to its sequential characteristic. Therefore, an efficient hardware platform is required to address the high latency caused by the sequential characteristic of text generation. In this paper, we present DFX, a multi-FPGA acceleration appliance that executes GPT-2 model inference end-to-end with low latency and high throughput in both summarization and generation stages. DFX uses model parallelism and optimized dataflow that is model-and-hardware-aware for fast simultaneous workload execution among devices. Its compute cores operate on custom instructions and provide GPT-2 operations end-to-end. We implement the proposed hardware architecture on four Xilinx Alveo U280 FPGAs and utilize all of the channels of the high bandwidth memory (HBM) and the maximum number of compute resources for high hardware efficiency. DFX achieves 5.58x speedup and 3.99x energy efficiency over four NVIDIA V100 GPUs on the modern GPT-2 model. DFX is also 8.21x more cost-effective than the GPU appliance, suggesting that it is a promising solution for text generation workloads in cloud datacenters. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: Extension of HOTCHIPS 2022 and accepted in MICRO 2022

arXiv:2208.07606 [pdf, ps, other]

RIS-Aided Localization Algorithm and Analysis: Tackling Non-Gaussian Angle Estimation Errors

Authors: Tuo Wu, Hong Ren, Cunhua Pan, Yi** Pan, Sheng Hong, Maged Elkashlan, Feng Shu, Jiangzhou Wang

Abstract: Reconfigurable intelligent surface (RIS)-aided localization systems are increasingly recognized for enhancing accuracy in internet of things (IoT) networks. However, prevailing studies tend to either assume a Gaussian distribution for angle estimation error (AEE) or directly neglect the impact of the AEE, overlooking its non-Gaussian nature in real-world scenarios, particularly with diverse estima… ▽ More Reconfigurable intelligent surface (RIS)-aided localization systems are increasingly recognized for enhancing accuracy in internet of things (IoT) networks. However, prevailing studies tend to either assume a Gaussian distribution for angle estimation error (AEE) or directly neglect the impact of the AEE, overlooking its non-Gaussian nature in real-world scenarios, particularly with diverse estimation methods (e.g., 2D-DFT algorithm). Addressing this oversight, this paper explores the design and performance analysis of RIS-aided localization systems, specifically tackling non-Gaussian AEE. We adopt the classical two-step three-dimensional (3D) localization scheme to determine the position of mobile user (MU). Initially, we estimate angles of arrival (AoAs) and time differences of arrival (TDoAs) at the RIS using different methods, resulting in non-Gaussian and Gaussian errors, respectively. Subsequently, to accommodate the non-Gaussian nature of AoAs errors and the Gaussian character of TDoA errors, we design a multiple weighted least squares (mWLS) algorithm to accurately localize MU. Besides, our research also includes a unique bias analysis for evaluating the performance of the proposed localization algorithm under both Gaussian and non-Gaussian errors. Simulation results demonstrate the effectiveness of both the proposed mWLS algorithm and the bias analysis methodology. △ Less

Submitted 18 March, 2024; v1 submitted 16 August, 2022; originally announced August 2022.

Comments: Keywords: Reconfigurable intelligent surface (RIS), intelligent reflecting surface (IRS)

arXiv:2208.07602 [pdf, other]

Joint Angle Estimation Error Analysis and 3D Positioning Algorithm Design for mmWave Positioning System

Authors: Tuo Wu, Cunhua Pan, Yi** Pan, Sheng Hong, Hong Ren, Maged Elkashlan, Feng Shu, Jiangzhou Wang

Abstract: In this paper, we propose a comprehensive framework to jointly analyze the angle estimation error and design the three-dimensional (3D) positioning algorithm for a millimeter wave (mmWave) positioning system. First, we estimate the angles of arrival (AoAs) at the anchors by applying the two-dimensional discrete Fourier transform (2D-DFT) algorithm. Based on the property of the 2D-DFT algorithm, th… ▽ More In this paper, we propose a comprehensive framework to jointly analyze the angle estimation error and design the three-dimensional (3D) positioning algorithm for a millimeter wave (mmWave) positioning system. First, we estimate the angles of arrival (AoAs) at the anchors by applying the two-dimensional discrete Fourier transform (2D-DFT) algorithm. Based on the property of the 2D-DFT algorithm, the angle estimation error is analyzed in terms of probability density functions (PDF). The analysis shows that the derived angle estimation error is non-Gaussian, which is different from the existing work. Second, the intricate expression of the PDF of the AoA estimation error is simplified by employing the first-order linear approximation of triangle functions. Then, we derive a complex expression for the variance based on the derived PDF. Specifically, for the azimuth estimation error, the variance is separately integrated according to the different non-zero intervals of the PDF. Finally, we apply the weighted least square (WLS) algorithm to estimate the 3D position of the MU by using the estimated AoAs and the obtained non-Gaussian variance. Extensive simulation results confirm that the derived angle estimation error is non-Gaussian, and also demonstrate the superiority of the proposed framework. △ Less

Submitted 14 November, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

Comments: Keywords: mmWave Positioning System

arXiv:2208.07536 [pdf, other]

Dependency Tasks Offloading and Communication Resource Allocation in Collaborative UAVs Networks: A Meta-Heuristic Approach

Authors: Loc X. Nguyen, Yan Kyaw Tun, Tri Nguyen Dang, Yu Min Park, Zhu Han, Choong Seon Hong

Abstract: In recent years, unmanned aerial vehicles (UAVs) assisted mobile edge computing systems have been exploited by researchers as a promising solution for providing computation services to mobile users outside of terrestrial infrastructure coverage. However, it remains challenging for the standalone MEC-enabled UAVs in order to meet the computation requirement of numerous mobile users due to the limit… ▽ More In recent years, unmanned aerial vehicles (UAVs) assisted mobile edge computing systems have been exploited by researchers as a promising solution for providing computation services to mobile users outside of terrestrial infrastructure coverage. However, it remains challenging for the standalone MEC-enabled UAVs in order to meet the computation requirement of numerous mobile users due to the limited computation capacity of their onboard servers and battery lives. Therefore, we propose a collaborative scheme among UAVs so that UAVs can share the workload with idle UAVs. Moreover, current task offloading strategies frequently overlook task topology, which may result in poor performance or even system failure. To address the problem, we consider offloading tasks consisting of a set of sub-tasks, and each sub-task has dependencies on other sub-tasks, which is practical in the real world. Sub-tasks with dependencies need to wait for the resulting signal from preceding sub-tasks before being executed. This mechanism has serious effects on the offloading strategy. Then, we formulate an optimization problem to minimize the average latency experienced by users by jointly controlling the offloading decision for dependent tasks and allocating the communication resources of UAVs. The formulated problem appears to be NP-hard and cannot be solved in polynomial time. Therefore, we divide the problem into two sub-problems: the offloading decision problem and the communication resource allocation problem. Then a meta-heuristic method is proposed to find the sub-optimal solution of the task offloading problem, while the communication resource allocation problem is solved by using convex optimization. Finally, we perform substantial simulation experiments, and the result shows that the proposed offloading technique effectively minimizes the average latency of users, compared with other benchmark schemes. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: 14 pages, 9 figures

arXiv:2207.05364 [pdf, other]

A Bipartite Graph Neural Network Approach for Scalable Beamforming Optimization

Authors: Junbeom Kim, Hoon Lee, Seung-Eun Hong, Seok-Hwan Park

Abstract: Deep learning (DL) techniques have been intensively studied for the optimization of multi-user multiple-input single-output (MU-MISO) downlink systems owing to the capability of handling nonconvex formulations. However, the fixed computation structure of existing deep neural networks (DNNs) lacks flexibility with respect to the system size, i.e., the number of antennas or users. This paper develop… ▽ More Deep learning (DL) techniques have been intensively studied for the optimization of multi-user multiple-input single-output (MU-MISO) downlink systems owing to the capability of handling nonconvex formulations. However, the fixed computation structure of existing deep neural networks (DNNs) lacks flexibility with respect to the system size, i.e., the number of antennas or users. This paper develops a bipartite graph neural network (BGNN) framework, a scalable DL solution designed for multi-antenna beamforming optimization. The MU-MISO system is first characterized by a bipartite graph where two disjoint vertex sets, each of which consists of transmit antennas and users, are connected via pairwise edges. These vertex interconnection states are modeled by channel fading coefficients. Thus, a generic beamforming optimization process is interpreted as a computation task over a weight bipartite graph. This approach partitions the beamforming optimization procedure into multiple suboperations dedicated to individual antenna vertices and user vertices. Separated vertex operations lead to scalable beamforming calculations that are invariant to the system size. The vertex operations are realized by a group of DNN modules that collectively form the BGNN architecture. Identical DNNs are reused at all antennas and users so that the resultant learning structure becomes flexible to the network size. Component DNNs of the BGNN are trained jointly over numerous MU-MISO configurations with randomly varying network sizes. As a result, the trained BGNN can be universally applied to arbitrary MU-MISO systems. Numerical results validate the advantages of the BGNN framework over conventional methods. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: accepted for publication on IEEE Transactions on Wireless Communications

arXiv:2205.12925 [pdf, other]

These Maps Are Made For Walking: Real-Time Terrain Property Estimation for Mobile Robots

Authors: Parker Ewen, Adam Li, Yuxin Chen, Steven Hong, Ram Vasudevan

Abstract: The equations of motion governing mobile robots are dependent on terrain properties such as the coefficient of friction, and contact model parameters. Estimating these properties is thus essential for robotic navigation. Ideally any map estimating terrain properties should run in real time, mitigate sensor noise, and provide probability distributions of the aforementioned properties, thus enabling… ▽ More The equations of motion governing mobile robots are dependent on terrain properties such as the coefficient of friction, and contact model parameters. Estimating these properties is thus essential for robotic navigation. Ideally any map estimating terrain properties should run in real time, mitigate sensor noise, and provide probability distributions of the aforementioned properties, thus enabling risk-mitigating navigation and planning. This paper addresses these needs and proposes a Bayesian inference framework for semantic map** which recursively estimates both the terrain surface profile and a probability distribution for terrain properties using data from a single RGB-D camera. The proposed framework is evaluated in simulation against other semantic map** methods and is shown to outperform these state-of-the-art methods in terms of correctly estimating simulated ground-truth terrain properties when evaluated using a precision-recall curve and the Kullback-Leibler divergence test. Additionally, the proposed method is deployed on a physical legged robotic platform in both indoor and outdoor environments, and we show our method correctly predicts terrain properties in both cases. The proposed framework runs in real-time and includes a ROS interface for easy integration. △ Less

Submitted 25 May, 2022; originally announced May 2022.

arXiv:2205.01167 [pdf]

3D Convolutional Neural Networks for Dendrite Segmentation Using Fine-Tuning and Hyperparameter Optimization

Authors: Jim James, Nathan Pruyne, Tiberiu Stan, Marcus Schwarting, Jiwon Yeom, Seungbum Hong, Peter Voorhees, Ben Blaiszik, Ian Foster

Abstract: Dendritic microstructures are ubiquitous in nature and are the primary solidification morphologies in metallic materials. Techniques such as x-ray computed tomography (XCT) have provided new insights into dendritic phase transformation phenomena. However, manual identification of dendritic morphologies in microscopy data can be both labor intensive and potentially ambiguous. The analysis of 3D dat… ▽ More Dendritic microstructures are ubiquitous in nature and are the primary solidification morphologies in metallic materials. Techniques such as x-ray computed tomography (XCT) have provided new insights into dendritic phase transformation phenomena. However, manual identification of dendritic morphologies in microscopy data can be both labor intensive and potentially ambiguous. The analysis of 3D datasets is particularly challenging due to their large sizes (terabytes) and the presence of artifacts scattered within the imaged volumes. In this study, we trained 3D convolutional neural networks (CNNs) to segment 3D datasets. Three CNN architectures were investigated, including a new 3D version of FCDense. We show that using hyperparameter optimization (HPO) and fine-tuning techniques, both 2D and 3D CNN architectures can be trained to outperform the previous state of the art. The 3D U-Net architecture trained in this study produced the best segmentations according to quantitative metrics (pixel-wise accuracy of 99.84% and a boundary displacement error of 0.58 pixels), while 3D FCDense produced the smoothest boundaries and best segmentations according to visual inspection. The trained 3D CNNs are able to segment entire 852 x 852 x 250 voxel 3D volumes in only ~60 seconds, thus hastening the progress towards a deeper understanding of phase transformation phenomena such as dendritic solidification. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.14175 [pdf, other]

doi 10.1117/12.2613274

Segmentation of kidney stones in endoscopic video feeds

Authors: Zachary A Stoebner, Daiwei Lu, Seok Hee Hong, Nicholas L Kavoussi, Ipek Oguz

Abstract: Image segmentation has been increasingly applied in medical settings as recent developments have skyrocketed the potential applications of deep learning. Urology, specifically, is one field of medicine that is primed for the adoption of a real-time image segmentation system with the long-term aim of automating endoscopic stone treatment. In this project, we explored supervised deep learning models… ▽ More Image segmentation has been increasingly applied in medical settings as recent developments have skyrocketed the potential applications of deep learning. Urology, specifically, is one field of medicine that is primed for the adoption of a real-time image segmentation system with the long-term aim of automating endoscopic stone treatment. In this project, we explored supervised deep learning models to annotate kidney stones in surgical endoscopic video feeds. In this paper, we describe how we built a dataset from the raw videos and how we developed a pipeline to automate as much of the process as possible. For the segmentation task, we adapted and analyzed three baseline deep learning models -- U-Net, U-Net++, and DenseNet -- to predict annotations on the frames of the endoscopic videos with the highest accuracy above 90\%. To show clinical potential for real-time use, we also confirmed that our best trained model can accurately annotate new videos at 30 frames per second. Our results demonstrate that the proposed method justifies continued development and study of image segmentation to annotate ureteroscopic video feeds. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: Published in SPIE Medical Imaging: Image Processing 2022 (9 pages, 5 figures, 1 table)

Journal ref: Proceedings Volume 12032, Medical Imaging 2022: Image Processing; 120323G (2022)

arXiv:2203.09487 [pdf, other]

Defending Against Adversarial Attack in ECG Classification with Adversarial Distillation Training

Authors: Jiahao Shao, Shijia Geng, Zhaoji Fu, Weilun Xu, Tong Liu, Shenda Hong

Abstract: In clinics, doctors rely on electrocardiograms (ECGs) to assess severe cardiac disorders. Owing to the development of technology and the increase in health awareness, ECG signals are currently obtained by using medical and commercial devices. Deep neural networks (DNNs) can be used to analyze these signals because of their high accuracy rate. However, researchers have found that adversarial attack… ▽ More In clinics, doctors rely on electrocardiograms (ECGs) to assess severe cardiac disorders. Owing to the development of technology and the increase in health awareness, ECG signals are currently obtained by using medical and commercial devices. Deep neural networks (DNNs) can be used to analyze these signals because of their high accuracy rate. However, researchers have found that adversarial attacks can significantly reduce the accuracy of DNNs. Studies have been conducted to defend ECG-based DNNs against traditional adversarial attacks, such as projected gradient descent (PGD), and smooth adversarial perturbation (SAP) which targets ECG classification; however, to the best of our knowledge, no study has completely explored the defense against adversarial attacks targeting ECG classification. Thus, we did different experiments to explore the effects of defense methods against white-box adversarial attack and black-box adversarial attack targeting ECG classification, and we found that some common defense methods performed well against these attacks. Besides, we proposed a new defense method called Adversarial Distillation Training (ADT) which comes from defensive distillation and can effectively improve the generalization performance of DNNs. The results show that our method performed more effectively against adversarial attacks targeting on ECG classification than the other baseline methods, namely, adversarial training, defensive distillation, Jacob regularization, and noise-to-signal ratio regularization. Furthermore, we found that our method performed better against PGD attacks with low noise levels, which means that our method has stronger robustness. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2203.00512 [pdf, other]

A Deep Bayesian Neural Network for Cardiac Arrhythmia Classification with Rejection from ECG Recordings

Authors: Wenrui Zhang, Xinxin Di, Guodong Wei, Shijia Geng, Zhaoji Fu, Shenda Hong

Abstract: With the development of deep learning-based methods, automated classification of electrocardiograms (ECGs) has recently gained much attention. Although the effectiveness of deep neural networks has been encouraging, the lack of information given by the outputs restricts clinicians' reexamination. If the uncertainty estimation comes along with the classification results, cardiologists can pay more… ▽ More With the development of deep learning-based methods, automated classification of electrocardiograms (ECGs) has recently gained much attention. Although the effectiveness of deep neural networks has been encouraging, the lack of information given by the outputs restricts clinicians' reexamination. If the uncertainty estimation comes along with the classification results, cardiologists can pay more attention to "uncertain" cases. Our study aims to classify ECGs with rejection based on data uncertainty and model uncertainty. We perform experiments on a real-world 12-lead ECG dataset. First, we estimate uncertainties using the Monte Carlo dropout for each classification prediction, based on our Bayesian neural network. Then, we accept predictions with uncertainty under a given threshold and provide "uncertain" cases for clinicians. Furthermore, we perform a simulation experiment using varying thresholds. Finally, with the help of a clinician, we conduct case studies to explain the results of large uncertainties and incorrect predictions with small uncertainties. The results show that correct predictions are more likely to have smaller uncertainties, and the performance on accepted predictions improves as the accepting ratio decreases (i.e. more rejections). Case studies also help explain why rejection can improve the performance. Our study helps neural networks produce more accurate results and provide information on uncertainties to better assist clinicians in the diagnosis process. It can also enable deep-learning-based ECG interpretation in clinical implementation. △ Less

Submitted 25 February, 2022; originally announced March 2022.

arXiv:2202.12458 [pdf, other]

A Simple Self-Supervised ECG Representation Learning Method via Manipulated Temporal-Spatial Reverse Detection

Authors: Wenrui Zhang, Shijia Geng, Shenda Hong

Abstract: Learning representations from electrocardiogram (ECG) signals can serve as a fundamental step for different machine learning-based ECG tasks. In order to extract general ECG representations that can be adapted to various downstream tasks, the learning process needs to be based on a general ECG-related task which can be achieved through self-supervised learning (SSL). However, existing SSL approach… ▽ More Learning representations from electrocardiogram (ECG) signals can serve as a fundamental step for different machine learning-based ECG tasks. In order to extract general ECG representations that can be adapted to various downstream tasks, the learning process needs to be based on a general ECG-related task which can be achieved through self-supervised learning (SSL). However, existing SSL approaches either fail to provide satisfactory ECG representations or require too much effort to construct the learning data. In this paper, we propose the T-S reverse detection, a simple yet effective self-supervised approach to learn ECG representations. Inspired by the temporal and spatial characteristics of ECG signals, we flip the original signals horizontally (temporal reverse), vertically (spatial reverse), and both horizontally and vertically (temporal-spatial reverse). Learning is then done by classifying four types of signals including the original one. To verify the effectiveness of the proposed method, we perform a downstream task to detect atrial fibrillation (AF) which is one of the most common ECG tasks. The results show that the ECG representations learned with our method achieve remarkable performance. Furthermore, after exploring the representation feature space and investigating salient ECG locations, we conclude that the temporal reverse is more effective for learning ECG representations than the spatial reverse. △ Less

Submitted 21 September, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

arXiv:2202.02924 [pdf, other]

3TO: THz-Enabled Throughput and Trajectory Optimization of UAVs in 6G Networks by Proximal Policy Optimization Deep Reinforcement Learning

Authors: Sheikh Salman Hassan, Yu Min Park, Yan Kyaw Tun, Walid Saad, Zhu Han, Choong Seon Hong

Abstract: Next-generation networks need to meet ubiquitous and high data-rate demand. Therefore, this paper considers the throughput and trajectory optimization of terahertz (THz)-enabled unmanned aerial vehicles (UAVs) in the sixth-generation (6G) communication networks. In the considered scenario, multiple UAVs must provide on-demand terabits per second (TB/s) services to an urban area along with existing… ▽ More Next-generation networks need to meet ubiquitous and high data-rate demand. Therefore, this paper considers the throughput and trajectory optimization of terahertz (THz)-enabled unmanned aerial vehicles (UAVs) in the sixth-generation (6G) communication networks. In the considered scenario, multiple UAVs must provide on-demand terabits per second (TB/s) services to an urban area along with existing terrestrial networks. However, THz-empowered UAVs pose some new constraints, e.g., dynamic THz-channel conditions for ground users (GUs) association and UAV trajectory optimization to fulfill GU's throughput demands. Thus, a framework is proposed to address these challenges, where a joint UAVs-GUs association, transmit power, and the trajectory optimization problem is studied. The formulated problem is mixed-integer non-linear programming (MINLP), which is NP-hard to solve. Consequently, an iterative algorithm is proposed to solve three sub-problems iteratively, i.e., UAVs-GUs association, transmit power, and trajectory optimization. Simulation results demonstrate that the proposed algorithm increased the throughput by up to 10%, 68.9%, and 69.1% respectively compared to baseline algorithms. △ Less

Submitted 6 February, 2022; originally announced February 2022.

arXiv:2202.02533 [pdf, other]

Blue Data Computation Maximization in 6G Space-Air-Sea Non-Terrestrial Networks

Authors: Sheikh Salman Hassan, Yan Kyaw Tun, Walid Saad, Zhu Han, Choong Seon Hong

Abstract: Non-terrestrial networks (NTN), encompassing space and air platforms, are a key component of the upcoming sixth-generation (6G) cellular network. Meanwhile, maritime network traffic has grown significantly in recent years due to sea transportation used for national defense, research, recreational activities, domestic and international trade. In this paper, the seamless and reliable demand for comm… ▽ More Non-terrestrial networks (NTN), encompassing space and air platforms, are a key component of the upcoming sixth-generation (6G) cellular network. Meanwhile, maritime network traffic has grown significantly in recent years due to sea transportation used for national defense, research, recreational activities, domestic and international trade. In this paper, the seamless and reliable demand for communication and computation in maritime wireless networks is investigated. Two types of marine user equipment (UEs), i.e., low-antenna gain and high-antenna gain UEs, are considered. A joint task computation and time allocation problem for weighted sum-rate maximization is formulated as mixed-integer linear programming (MILP). The goal is to design an algorithm that enables the network to efficiently provide backhaul resources to an unmanned aerial vehicle (UAV) and offload HUEs tasks to LEO satellite for blue data (i.e., marine user's data). To solve this MILP, a solution based on the Bender and primal decomposition is proposed. The Bender decomposes MILP into the master problem for binary task decision and subproblem for continuous-time resource allocation. Moreover, primal decomposition deals with a coupling constraint in the subproblem. Finally, numerical results demonstrate that the proposed algorithm provides the maritime UEs coverage demand in polynomial time computational complexity and achieves a near-optimal solution. △ Less

Submitted 5 February, 2022; originally announced February 2022.

arXiv:2201.08321 [pdf, other]

doi 10.1109/LRA.2022.3184769

TOAST: Trajectory Optimization and Simultaneous Tracking using Shared Neural Network Dynamics

Authors: Taekyung Kim, Ho** Lee, Seongil Hong, Wonsuk Lee

Abstract: Neural networks have been increasingly employed in Model Predictive Controller (MPC) to control nonlinear dynamic systems. However, MPC still poses a problem that an achievable update rate is insufficient to cope with model uncertainty and external disturbances. In this paper, we present a novel control scheme that can design an optimal tracking controller using the neural network dynamics of the… ▽ More Neural networks have been increasingly employed in Model Predictive Controller (MPC) to control nonlinear dynamic systems. However, MPC still poses a problem that an achievable update rate is insufficient to cope with model uncertainty and external disturbances. In this paper, we present a novel control scheme that can design an optimal tracking controller using the neural network dynamics of the MPC, making it possible to be applied as a plug-and-play extension for any existing model-based feedforward controller. We also describe how our method handles a neural network containing history information, which does not follow a general form of dynamics. The proposed method is evaluated by its performance in classical control benchmarks with external disturbances. We also extend our control framework to be applied in an aggressive autonomous driving task with unknown friction. In all experiments, our method outperformed the compared methods by a large margin. Our controller also showed low control chattering levels, demonstrating that our feedback controller does not interfere with the optimal command of MPC. △ Less

Submitted 14 July, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Accepted to IEEE Robotics and Automation Letters (and IROS 2022). Our video can be found at https://youtu.be/YQG0yHE5jWw

Journal ref: IEEE Robotics and Automation Letters, 2022

arXiv:2112.06693 [pdf, other]

Hypernet-Ensemble Learning of Segmentation Probability for Medical Image Segmentation with Ambiguous Labels

Authors: Sungmin Hong, Anna K. Bonkhoff, Andrew Hoopes, Martin Bretzner, Markus D. Schirmer, Anne-Katrin Giese, Adrian V. Dalca, Polina Golland, Natalia S. Rost

Abstract: Despite the superior performance of Deep Learning (DL) on numerous segmentation tasks, the DL-based approaches are notoriously overconfident about their prediction with highly polarized label probability. This is often not desirable for many applications with the inherent label ambiguity even in human annotations. This challenge has been addressed by leveraging multiple annotations per image and t… ▽ More Despite the superior performance of Deep Learning (DL) on numerous segmentation tasks, the DL-based approaches are notoriously overconfident about their prediction with highly polarized label probability. This is often not desirable for many applications with the inherent label ambiguity even in human annotations. This challenge has been addressed by leveraging multiple annotations per image and the segmentation uncertainty. However, multiple per-image annotations are often not available in a real-world application and the uncertainty does not provide full control on segmentation results to users. In this paper, we propose novel methods to improve the segmentation probability estimation without sacrificing performance in a real-world scenario that we have only one ambiguous annotation per image. We marginalize the estimated segmentation probability maps of networks that are encouraged to under-/over-segment with the varying Tversky loss without penalizing balanced segmentation. Moreover, we propose a unified hypernetwork ensemble method to alleviate the computational burden of training multiple networks. Our approaches successfully estimated the segmentation probability maps that reflected the underlying structures and provided the intuitive control on segmentation for the challenging 3D medical image segmentation. Although the main focus of our proposed methods is not to improve the binary segmentation performance, our approaches marginally outperformed the state-of-the-arts. The codes are available at \url{https://github.com/sh4174/HypernetEnsemble}. △ Less

Submitted 13 December, 2021; originally announced December 2021.

MSC Class: 68T07 (Primary) 92C55; 94A08 (Secondary) ACM Class: I.4; I.4.6; I.2; I.2.1; I.5.1; I.5.4; J.3

arXiv:2112.05989 [pdf, other]

doi 10.1109/JSTSP.2022.3195671

An Overview of Signal Processing Techniques for RIS/IRS-aided Wireless Systems

Authors: Cunhua Pan, Gui Zhou, Kangda Zhi, Sheng Hong, Tuo Wu, Yi** Pan, Hong Ren, Marco Di Renzo, A. Lee Swindlehurst, Rui Zhang, Angela Yingjun Zhang

Abstract: In the past as well as present wireless communication systems, the wireless propagation environment is regarded as an uncontrollable black box that impairs the received signal quality, and its negative impacts are compensated for by relying on the design of various sophisticated transmission/reception schemes. However, the improvements through applying such schemes operating at two endpoints (i.e.… ▽ More In the past as well as present wireless communication systems, the wireless propagation environment is regarded as an uncontrollable black box that impairs the received signal quality, and its negative impacts are compensated for by relying on the design of various sophisticated transmission/reception schemes. However, the improvements through applying such schemes operating at two endpoints (i.e., transmitter and receiver) only are limited even after five generations of wireless systems. Reconfigurable intelligent surface (RIS) or intelligent reflecting surface (IRS) have emerged as a new and revolutionary technology that can configure the wireless environment in a favorable manner by properly tuning the phase shifts of a large number of quasi passive and low-cost reflecting elements, thus standing out as a promising candidate technology for the next-/sixth-generation (6G) wireless system. However, to reap the performance benefits promised by RIS/IRS, efficient signal processing techniques are crucial, for a variety of purposes such as channel estimation, transmission design, radio localization, and so on. In this paper, we provide a comprehensive overview of recent advances on RIS/IRS-aided wireless systems from the signal processing perspective. We also highlight promising research directions that are worthy of investigation in the future. △ Less

Submitted 15 December, 2021; v1 submitted 11 December, 2021; originally announced December 2021.

Comments: Invited overview paper for the special issue of RIS/IRS in IEEE JSTSP. Keywords: Reconfigurable intelligent surface (RIS), intelligent reflecting surface (IRS), channel estimation, transmission design, radio localization

arXiv:2111.14023 [pdf, other]

Optimization of RIS Configurations for Multiple-RIS-Aided mmWave Positioning Systems based on CRLB Analysis

Authors: Yu Liu, Sheng Hong, Cunhua Pan, Yinlu Wang, Yi** Pan, Ming Chen

Abstract: Reconfigurable intelligent surface (RIS) is a promising technology for future millimeter-wave (mmWave) communication systems. However, its potential benefits of adopting RIS for high-precision positioning in mmWave systems are still less understood. In this paper, we study a multiple-RIS-aided mmWave positioning system and derive the Cram$\rm{\acute{e}}$r-Rao error bound. Based on the derived boun… ▽ More Reconfigurable intelligent surface (RIS) is a promising technology for future millimeter-wave (mmWave) communication systems. However, its potential benefits of adopting RIS for high-precision positioning in mmWave systems are still less understood. In this paper, we study a multiple-RIS-aided mmWave positioning system and derive the Cram$\rm{\acute{e}}$r-Rao error bound. Based on the derived bound, we optimize the phase shift of the RISs by the particle swarm optimization (PSO) algorithm. Numerical results have demonstrated the advantages of using multiple RISs in enhancing the positioning accuracy in mmWave systems. △ Less

Submitted 27 November, 2021; originally announced November 2021.

Comments: Submitted to IEEE Journal

arXiv:2111.00195 [pdf, other]

Learning Continuous Representation of Audio for Arbitrary Scale Super Resolution

Authors: Jaechang Kim, Yunjoo Lee, Seunghoon Hong, Jungseul Ok

Abstract: Audio super resolution aims to predict the missing high resolution components of the low resolution audio signals. While audio in nature is a continuous signal, current approaches treat it as discrete data (i.e., input is defined on discrete time domain), and consider the super resolution over a fixed scale factor (i.e., it is required to train a new neural network to change output resolution). To… ▽ More Audio super resolution aims to predict the missing high resolution components of the low resolution audio signals. While audio in nature is a continuous signal, current approaches treat it as discrete data (i.e., input is defined on discrete time domain), and consider the super resolution over a fixed scale factor (i.e., it is required to train a new neural network to change output resolution). To obtain a continuous representation of audio and enable super resolution for arbitrary scale factor, we propose a method of implicit neural representation, coined Local Implicit representation for Super resolution of Arbitrary scale (LISA). Our method locally parameterizes a chunk of audio as a function of continuous time, and represents each chunk with the local latent codes of neighboring chunks so that the function can extrapolate the signal at any time coordinate, i.e., infinite resolution. To learn a continuous representation for audio, we design a self-supervised learning strategy to practice super resolution tasks up to the original resolution by stochastic selection. Our numerical evaluation shows that LISA outperforms the previous fixed-scale methods with a fraction of parameters, but also is capable of arbitrary scale super resolution even beyond the resolution of training data. △ Less

Submitted 30 March, 2022; v1 submitted 30 October, 2021; originally announced November 2021.

Comments: Accepted by ICASSP 2022. The source code is available at https://github.com/ml-postech/LISA

arXiv:2109.08908 [pdf, other]

Intra-Inter Subject Self-supervised Learning for Multivariate Cardiac Signals

Authors: Xiang Lan, Dianwen Ng, Shenda Hong, Mengling Feng

Abstract: Learning information-rich and generalizable representations effectively from unlabeled multivariate cardiac signals to identify abnormal heart rhythms (cardiac arrhythmias) is valuable in real-world clinical settings but often challenging due to its complex temporal dynamics. Cardiac arrhythmias can vary significantly in temporal patterns even for the same patient ($i.e.$, intra subject difference… ▽ More Learning information-rich and generalizable representations effectively from unlabeled multivariate cardiac signals to identify abnormal heart rhythms (cardiac arrhythmias) is valuable in real-world clinical settings but often challenging due to its complex temporal dynamics. Cardiac arrhythmias can vary significantly in temporal patterns even for the same patient ($i.e.$, intra subject difference). Meanwhile, the same type of cardiac arrhythmia can show different temporal patterns among different patients due to different cardiac structures ($i.e.$, inter subject difference). In this paper, we address the challenges by proposing an Intra-inter Subject self-supervised Learning (ISL) model that is customized for multivariate cardiac signals. Our proposed ISL model integrates medical knowledge into self-supervision to effectively learn from intra-inter subject differences. In intra subject self-supervision, ISL model first extracts heartbeat-level features from each subject using a channel-wise attentional CNN-RNN encoder. Then a stationarity test module is employed to capture the temporal dependencies between heartbeats. In inter subject self-supervision, we design a set of data augmentations according to the clinical characteristics of cardiac signals and perform contrastive learning among subjects to learn distinctive representations for various types of patients. Extensive experiments on three real-world datasets were conducted. In a semi-supervised transfer learning scenario, our pre-trained ISL model leads about 10% improvement over supervised training when only 1% labeled data is available, suggesting strong generalizability and robustness of the model. △ Less

Submitted 18 September, 2021; originally announced September 2021.

Comments: preliminary version

arXiv:2107.14502 [pdf, other]

Collaboration in the Sky: A Distributed Framework for Task Offloading and Resource Allocation in Multi-Access Edge Computing

Authors: Yan Kyaw Tun, Tri Nguyen Dang, Kitae Kim, Madyan Anselwi, Walid Saad, Choong Seon Hong

Abstract: Recently, unmanned aerial vehicles (UAVs) assisted multi-access edge computing (MEC) systems emerged as a promising solution for providing computation services to mobile users outside of terrestrial infrastructure coverage. As each UAV operates independently, however, it is challenging to meet the computation demands of the mobile users due to the limited computing capacity at the UAV's MEC server… ▽ More Recently, unmanned aerial vehicles (UAVs) assisted multi-access edge computing (MEC) systems emerged as a promising solution for providing computation services to mobile users outside of terrestrial infrastructure coverage. As each UAV operates independently, however, it is challenging to meet the computation demands of the mobile users due to the limited computing capacity at the UAV's MEC server as well as the UAV's energy constraint. Therefore, collaboration among UAVs is needed. In this paper, a collaborative multi-UAV-assisted MEC system integrated with a MEC-enabled terrestrial base station (BS) is proposed. Then, the problem of minimizing the total latency experienced by the mobile users in the proposed system is studied by optimizing the offloading decision as well as the allocation of communication and computing resources while satisfying the energy constraints of both mobile users and UAVs. The proposed problem is shown to be a non-convex, mixed-integer nonlinear problem (MINLP) that is intractable. Therefore, the formulated problem is decomposed into three subproblems: i) users tasks offloading decision problem, ii) communication resource allocation problem and iii) UAV-assisted MEC decision problem. Then, the Lagrangian relaxation and alternating direction method of multipliers (ADMM) methods are applied to solve the decomposed problems, alternatively. Simulation results show that the proposed approach reduces the average latency by up to 40.7\% and 4.3\% compared to the greedy and exhaustive search methods. △ Less

Submitted 30 July, 2021; originally announced July 2021.

Comments: Submitted to IEEE Internet of Things Journals

arXiv:2107.13203 [pdf, other]

Collision-free Formation Control of Multiple Nano-quadrotors

Authors: Anh Tung Nguyen, Ji-Won Lee, Thanh Binh Nguyen, Sung Kyung Hong

Abstract: The utilisation of unmanned aerial vehicles has witnessed significant growth in real-world applications including surveillance tasks, military missions, and transportation deliveries. This letter investigates practical problems of formation control for multiple nano-quadrotor systems. To be more specific, the first aim of this work is to develop a theoretical framework for the time-varying formati… ▽ More The utilisation of unmanned aerial vehicles has witnessed significant growth in real-world applications including surveillance tasks, military missions, and transportation deliveries. This letter investigates practical problems of formation control for multiple nano-quadrotor systems. To be more specific, the first aim of this work is to develop a theoretical framework for the time-varying formation flight of the multi-quadrotor system regarding anti-collisions. In order to achieve this goal, the finite cut-off potential function is devoted to avoiding collisions among vehicles in the group as well as between vehicles and an obstacle. The control algorithm navigates the group of nano-quadrotors to asymptotically reach an anticipated time-varying formation. The second aim is to implement the proposed algorithm on Crazyflies nanoquadrotors, one of the most ubiquitous indoor experimentation platforms. Several practical scenarios are conducted to tendentiously expose anti-collision abilities among group members as well as between vehicles and an obstacle. The experimental outcomes validate the effectiveness of the proposed method in the formation tracking and the collision avoidance of multiple nano-quadrotors. △ Less

Submitted 28 July, 2021; originally announced July 2021.

arXiv:2107.09700 [pdf, other]

3D-StyleGAN: A Style-Based Generative Adversarial Network for Generative Modeling of Three-Dimensional Medical Images

Authors: Sungmin Hong, Razvan Marinescu, Adrian V. Dalca, Anna K. Bonkhoff, Martin Bretzner, Natalia S. Rost, Polina Golland

Abstract: Image synthesis via Generative Adversarial Networks (GANs) of three-dimensional (3D) medical images has great potential that can be extended to many medical applications, such as, image enhancement and disease progression modeling. However, current GAN technologies for 3D medical image synthesis need to be significantly improved to be readily adapted to real-world medical problems. In this paper,… ▽ More Image synthesis via Generative Adversarial Networks (GANs) of three-dimensional (3D) medical images has great potential that can be extended to many medical applications, such as, image enhancement and disease progression modeling. However, current GAN technologies for 3D medical image synthesis need to be significantly improved to be readily adapted to real-world medical problems. In this paper, we extend the state-of-the-art StyleGAN2 model, which natively works with two-dimensional images, to enable 3D image synthesis. In addition to the image synthesis, we investigate the controllability and interpretability of the 3D-StyleGAN via style vectors inherited form the original StyleGAN2 that are highly suitable for medical applications: (i) the latent space projection and reconstruction of unseen real images, and (ii) style mixing. We demonstrate the 3D-StyleGAN's performance and feasibility with ~12,000 three-dimensional full brain MR T1 images, although it can be applied to any 3D volumetric images. Furthermore, we explore different configurations of hyperparameters to investigate potential improvement of the image synthesis with larger networks. The codes and pre-trained networks are available online: https://github.com/sh4174/3DStyleGAN. △ Less

Submitted 20 July, 2021; originally announced July 2021.

Comments: 11 pages, 6 figures, 2 tables. Provisionally Accepted at DGM4MICCAI workshop in MICCAI 2021

MSC Class: 68T07 (Primary) 68T01 (Secondary) ACM Class: I.2; I.4

arXiv:2107.02520 [pdf, ps, other]

Deep Learning Methods for Joint Optimization of Beamforming and Fronthaul Quantization in Cloud Radio Access Networks

Authors: Daesung Yu, Hoon Lee, Seok-Hwan Park, Seung-Eun Hong

Abstract: Cooperative beamforming across access points (APs) and fronthaul quantization strategies are essential for cloud radio access network (C-RAN) systems. The nonconvexity of the C-RAN optimization problems, which is stemmed from per-AP power and fronthaul capacity constraints, requires high computational complexity for executing iterative algorithms. To resolve this issue, we investigate a deep learn… ▽ More Cooperative beamforming across access points (APs) and fronthaul quantization strategies are essential for cloud radio access network (C-RAN) systems. The nonconvexity of the C-RAN optimization problems, which is stemmed from per-AP power and fronthaul capacity constraints, requires high computational complexity for executing iterative algorithms. To resolve this issue, we investigate a deep learning approach where the optimization module is replaced with a well-trained deep neural network (DNN). An efficient learning solution is proposed which constructs a DNN to produce a low-dimensional representation of optimal beamforming and quantization strategies. Numerical results validate the advantages of the proposed learning solution. △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: accepted for publication on IEEE Wireless Communications Letters

Showing 1–50 of 93 results for author: Hong, S