Search | arXiv e-print repository

doi 10.1016/j.compbiomed.2024.108746

Lesion-Aware Cross-Phase Attention Network for Renal Tumor Subtype Classification on Multi-Phase CT Scans

Authors: Kwang-Hyun Uhm, Seung-Won Jung, Sung-Hoo Hong, Sung-Jea Ko

Abstract: Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Altho… ▽ More Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Although deep learning-based approaches have been recently explored for differential diagnosis of kidney cancer, they do not explicitly model the relationships between CT phases in the network design, limiting the diagnostic performance. In this paper, we propose a novel lesion-aware cross-phase attention network (LACPANet) that can effectively capture temporal dependencies of renal lesions across CT phases to accurately classify the lesions into five major pathological subtypes from time-series multi-phase CT images. We introduce a 3D inter-phase lesion-aware attention mechanism to learn effective 3D lesion features that are used to estimate attention weights describing the inter-phase relations of the enhancement patterns. We also present a multi-scale attention scheme to capture and aggregate temporal patterns of lesion features at different spatial scales for further improvement. Extensive experiments on multi-phase CT scans of kidney cancer patients from the collected dataset demonstrate that our LACPANet outperforms state-of-the-art approaches in diagnostic accuracy. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: This article has been accepted for publication in Computers in Biology and Medicine

Journal ref: Computers in Biology and Medicine, 108746, 2024

arXiv:2405.18701 [pdf, other]

Near-Field Localization with RIS via Two-Dimensional Signal Path Classification

Authors: Jeongwan Kang, Seung-Woo Ko, Sunwoo Kim

Abstract: In this paper, we propose two-dimensional signal path classification (2D-SPC) for reconfigurable intelligent surface (RIS)-assisted near-field (NF) localization. In the NF regime, multiple RIS-driven signal paths (SPs) can contribute to precise localization if these are decomposable and the reflected locations on the RIS are known, referred to as SP decomposition (SPD) and SP labeling (SPL), respe… ▽ More In this paper, we propose two-dimensional signal path classification (2D-SPC) for reconfigurable intelligent surface (RIS)-assisted near-field (NF) localization. In the NF regime, multiple RIS-driven signal paths (SPs) can contribute to precise localization if these are decomposable and the reflected locations on the RIS are known, referred to as SP decomposition (SPD) and SP labeling (SPL), respectively. To this end, each RIS element modulates the incoming SP's phase by shifting it by one of the values in the phase shift profile (PSP) lists satisfying resolution requirements. By interworking with a conventional orthogonal frequency division multiplexing (OFDM) waveform, the user equipment can construct a 2D spectrum map that couples each SPs time of arrival (ToA) and PSP. Then, we design SPL by map** SPs with the corresponding reflected RIS elements when they share the same PSP. Given two unlabeled SPs, we derive a geometric discriminant from checking whether the current label is correct. It can be extended to more than three SPs by sorting them using pairwise geometric discriminants between adjacent ones. From simulation results, it has been demonstrated that the proposed 2D SPC achieves consistent localization accuracy even if insufficient PSPs are given. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 15pages, 12figures, Submitted to IEEE Transactions on Wireless Communications

arXiv:2312.05548 [pdf, other]

doi 10.1109/JBHI.2022.3219123

A Unified Multi-Phase CT Synthesis and Classification Framework for Kidney Cancer Diagnosis with Incomplete Data

Authors: Kwang-Hyun Uhm, Seung-Won Jung, Moon Hyung Choi, Sung-Hoo Hong, Sung-Jea Ko

Abstract: Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effectiv… ▽ More Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effective for the diagnosis task. In this paper, we propose a unified framework for kidney cancer diagnosis with incomplete multi-phase CT, which simultaneously recovers missing CT images and classifies cancer subtypes using the completed set of images. The advantage of our framework is that it encourages a synthesis model to explicitly learn to generate missing CT phases that are helpful for classifying cancer subtypes. We further incorporate lesion segmentation network into our framework to exploit lesion-level features for effective cancer classification in the whole CT volumes. The proposed framework is based on fully 3D convolutional neural networks to jointly optimize both synthesis and classification of 3D CT volumes. Extensive experiments on both in-house and external datasets demonstrate the effectiveness of our framework for the diagnosis with incomplete data compared with state-of-the-art baselines. In particular, cancer subtype classification using the completed CT data by our method achieves higher performance than the classification using the given incomplete data. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics

Journal ref: JBHI, 2022

arXiv:2312.05528 [pdf, other]

Exploring 3D U-Net Training Configurations and Post-Processing Strategies for the MICCAI 2023 Kidney and Tumor Segmentation Challenge

Authors: Kwang-Hyun Uhm, Hyunjun Cho, Zhixin Xu, Seohoon Lim, Seung-Won Jung, Sung-Hoo Hong, Sung-Jea Ko

Abstract: In 2023, it is estimated that 81,800 kidney cancer cases will be newly diagnosed, and 14,890 people will die from this cancer in the United States. Preoperative dynamic contrast-enhanced abdominal computed tomography (CT) is often used for detecting lesions. However, there exists inter-observer variability due to subtle differences in the imaging features of kidney and kidney tumors. In this paper… ▽ More In 2023, it is estimated that 81,800 kidney cancer cases will be newly diagnosed, and 14,890 people will die from this cancer in the United States. Preoperative dynamic contrast-enhanced abdominal computed tomography (CT) is often used for detecting lesions. However, there exists inter-observer variability due to subtle differences in the imaging features of kidney and kidney tumors. In this paper, we explore various 3D U-Net training configurations and effective post-processing strategies for accurate segmentation of kidneys, cysts, and kidney tumors in CT images. We validated our model on the dataset of the 2023 Kidney and Kidney Tumor Segmentation (KiTS23) challenge. Our method took second place in the final ranking of the KiTS23 challenge on unseen test data with an average Dice score of 0.820 and an average Surface Dice of 0.712. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: MICCAI 2023, KITS 2023 challenge 2nd place

arXiv:2311.08271 [pdf, other]

Mobility-Induced Graph Learning for WiFi Positioning

Authors: Kyuwon Han, Seung Min Yu, Seong-Lyun Kim, Seung-Woo Ko

Abstract: A smartphone-based user mobility tracking could be effective in finding his/her location, while the unpredictable error therein due to low specification of built-in inertial measurement units (IMUs) rejects its standalone usage but demands the integration to another positioning technique like WiFi positioning. This paper aims to propose a novel integration technique using a graph neural network ca… ▽ More A smartphone-based user mobility tracking could be effective in finding his/her location, while the unpredictable error therein due to low specification of built-in inertial measurement units (IMUs) rejects its standalone usage but demands the integration to another positioning technique like WiFi positioning. This paper aims to propose a novel integration technique using a graph neural network called Mobility-INduced Graph LEarning (MINGLE), which is designed based on two types of graphs made by capturing different user mobility features. Specifically, considering sequential measurement points (MPs) as nodes, a user's regular mobility pattern allows us to connect neighbor MPs as edges, called time-driven mobility graph (TMG). Second, a user's relatively straight transition at a constant pace when moving from one position to another can be captured by connecting the nodes on each path, called a direction-driven mobility graph (DMG). Then, we can design graph convolution network (GCN)-based cross-graph learning, where two different GCN models for TMG and DMG are jointly trained by feeding different input features created by WiFi RTTs yet sharing their weights. Besides, the loss function includes a mobility regularization term such that the differences between adjacent location estimates should be less variant due to the user's stable moving pace. Noting that the regularization term does not require ground-truth location, MINGLE can be designed under semi- and self-supervised learning frameworks. The proposed MINGLE's effectiveness is extensively verified through field experiments, showing a better positioning accuracy than benchmarks, say root mean square errors (RMSEs) being 1.398 (m) and 1.073 (m) for self- and semi-supervised learning cases, respectively. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: submitted to a possible IEEE journal

arXiv:2309.07152 [pdf]

Novel Smart N95 Filtering Facepiece Respirator with Real-time Adaptive Fit Functionality and Wireless Humidity Monitoring for Enhanced Wearable Comfort

Authors: Kangkyu Kwon, Yoon Jae Lee, Yeongju Jung, Ira Soltis, Chanyeong Choi, Yewon Na, Lissette Romero, Myung Chul Kim, Nathan Rodeheaver, Hodam Kim, Michael S. Lloyd, Ziqing Zhuang, William King, Susan Xu, Seung-Hwan Ko, **woo Lee, Woon-Hong Yeo

Abstract: The widespread emergence of the COVID-19 pandemic has transformed our lifestyle, and facial respirators have become an essential part of daily life. Nevertheless, the current respirators possess several limitations such as poor respirator fit because they are incapable of covering diverse human facial sizes and shapes, potentially diminishing the effect of wearing respirators. In addition, the cur… ▽ More The widespread emergence of the COVID-19 pandemic has transformed our lifestyle, and facial respirators have become an essential part of daily life. Nevertheless, the current respirators possess several limitations such as poor respirator fit because they are incapable of covering diverse human facial sizes and shapes, potentially diminishing the effect of wearing respirators. In addition, the current facial respirators do not inform the user of the air quality within the smart facepiece respirator in case of continuous long-term use. Here, we demonstrate the novel smart N-95 filtering facepiece respirator that incorporates the humidity sensor and pressure sensory feedback-enabled self-fit adjusting functionality for the effective performance of the facial respirator to prevent the transmission of airborne pathogens. The laser-induced graphene (LIG) constitutes the humidity sensor, and the pressure sensor array based on the dielectric elastomeric sponge monitors the respirator contact on the face of the user, providing the sensory information for a closed-loop feedback mechanism. As a result of the self-fit adjusting mode along with elastomeric lining, the fit factor is increased by 3.20 and 5 times at average and maximum respectively. We expect that the experimental proof-of-concept of this work will offer viable solutions to the current commercial respirators to address the limitations. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 20 pages, 5 figures, 1 table, submitted for possible publication

MSC Class: 92C55

arXiv:2211.07860 [pdf, other]

Enabling AI Quality Control via Feature Hierarchical Edge Inference

Authors: **hyuk Choi, Seong-Lyun Kim, Seung-Woo Ko

Abstract: With the rise of edge computing, various AI services are expected to be available at a mobile side through the inference based on deep neural network (DNN) operated at the network edge, called edge inference (EI). On the other hand, the resulting AI quality (e.g., mean average precision in objective detection) has been regarded as a given factor, and AI quality control has yet to be explored despi… ▽ More With the rise of edge computing, various AI services are expected to be available at a mobile side through the inference based on deep neural network (DNN) operated at the network edge, called edge inference (EI). On the other hand, the resulting AI quality (e.g., mean average precision in objective detection) has been regarded as a given factor, and AI quality control has yet to be explored despite its importance in addressing the diverse demands of different users. This work aims at tackling the issue by proposing a feature hierarchical EI (FHEI), comprising feature network and inference network deployed at an edge server and corresponding mobile, respectively. Specifically, feature network is designed based on feature hierarchy, a one-directional feature dependency with a different scale. A higher scale feature requires more computation and communication loads while it provides a better AI quality. The tradeoff enables FHEI to control AI quality gradually w.r.t. communication and computation loads, leading to deriving a near-to-optimal solution to maximize multi-user AI quality under the constraints of uplink \& downlink transmissions and edge server and mobile computation capabilities. It is verified by extensive simulations that the proposed joint communication-and-computation control on FHEI architecture always outperforms several benchmarks by differentiating each user's AI quality depending on the communication and computation conditions. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: 7 pages, 6 figures, Conference Version

arXiv:2211.06225 [pdf, other]

Over-the-Air Consensus for Distributed Vehicle Platooning Control (Extended version)

Authors: Jihoon Lee, Yonghoon Jang, Hansol Kim, Seong-Lyun Kim, Seung-Woo Ko

Abstract: A distributed control of vehicle platooning is referred to as distributed consensus (DC) since many autonomous vehicles (AVs) reach a consensus to move as one body with the same velocity and inter-distance. For DC control to be stable, other AVs' real-time position information should be inputted to each AV's controller via vehicle-to-vehicle (V2V) communications. On the other hand, too many V2V li… ▽ More A distributed control of vehicle platooning is referred to as distributed consensus (DC) since many autonomous vehicles (AVs) reach a consensus to move as one body with the same velocity and inter-distance. For DC control to be stable, other AVs' real-time position information should be inputted to each AV's controller via vehicle-to-vehicle (V2V) communications. On the other hand, too many V2V links should be simultaneously established and frequently retrained, causing frequent packet loss and longer communication latency. We propose a novel DC algorithm called over-the-air consensus (AirCons), a joint communication-and-control design with two key features to overcome the above limitations. First, exploiting a wireless signal's superposition and broadcasting properties renders all AVs' signals to converge to a specific value proportional to participating AVs' average position without individual V2V channel information. Second, the estimated average position is used to control each AV's dynamics instead of each AV's individual position. Through analytic and numerical studies, the effectiveness of the proposed AirCons designed on the state-of-the-art New Radio architecture is verified by showing a $14.22\%$ control gain compared to the benchmark without the average position. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2011.12713 [pdf]

A Secure Deep Probabilistic Dynamic Thermal Line Rating Prediction

Authors: N. Safari, S. M. Mazhari, C. Y. Chung, S. B. Ko

Abstract: Accurate short-term prediction of overhead line (OHL) transmission ampacity can directly affect the efficiency of power system operation and planning. Any overestimation of the dynamic thermal line rating (DTLR) can lead to lifetime degradation and failure of OHLs, safety hazards, etc. This paper presents a secure yet sharp probabilistic prediction model for the hour-ahead forecasting of the DTLR.… ▽ More Accurate short-term prediction of overhead line (OHL) transmission ampacity can directly affect the efficiency of power system operation and planning. Any overestimation of the dynamic thermal line rating (DTLR) can lead to lifetime degradation and failure of OHLs, safety hazards, etc. This paper presents a secure yet sharp probabilistic prediction model for the hour-ahead forecasting of the DTLR. The security of the proposed DTLR limits the frequency of DTLR prediction exceeding the actual DTLR. The model is based on an augmented deep learning architecture that makes use of a wide range of predictors, including historical climatology data and latent variables obtained during DTLR calculation. Furthermore, by introducing a customized cost function, the deep neural network is trained to consider the DTLR security based on the required probability of exceedance while minimizing deviations of the predicted DTLRs from the actual values. The proposed probabilistic DTLR is developed and verified using recorded experimental data. The simulation results validate the superiority of the proposed DTLR compared to state-of-the-art prediction models using well-known evaluation metrics. △ Less

Submitted 21 November, 2020; originally announced November 2020.

Comments: The work is accepted for publication in Journal of Modern Power Systems and Clean Energy

arXiv:2011.03698 [pdf, other]

Exploiting User Mobility for WiFi RTT Positioning: A Geometric Approach

Authors: Kyuwon Han, Seung Min Yu, Seong-Lyun Kim, Seung-Woo Ko

Abstract: Recently, round-trip time (RTT) measured by a fine-timing measurement protocol has received great attention in the area of WiFi positioning. It provides an acceptable ranging accuracy in favorable environments when a line-of-sight (LOS) path exists. Otherwise, a signal is detoured along with non-LOS paths, making the resultant ranging results different from the ground-truth, called an RTT bias, wh… ▽ More Recently, round-trip time (RTT) measured by a fine-timing measurement protocol has received great attention in the area of WiFi positioning. It provides an acceptable ranging accuracy in favorable environments when a line-of-sight (LOS) path exists. Otherwise, a signal is detoured along with non-LOS paths, making the resultant ranging results different from the ground-truth, called an RTT bias, which is the main reason for poor positioning performance. To address it, we aim at leveraging the user mobility trajectory detected by a smartphone's inertial measurement units, called pedestrian dead reckoning (PDR). Specifically, PDR provides the geographic relation among adjacent locations, guiding the resultant positioning estimates' sequence not to deviate from the user trajectory. To this end, we describe their relations as multiple geometric equations, enabling us to render a novel positioning algorithm with acceptable accuracy. Depending on the mobility pattern being linear or arbitrary, we develop different algorithms divided into two phases. First, we can jointly estimate an RTT bias of each AP and the user's step length by leveraging the geometric relation mentioned above. It enables us to construct a user's relative trajectory defined on the concerned AP's local coordinate system. Second, we align every AP's relative trajectory into a single one, called trajectory alignment, equivalent to transformation to the global coordinate system. As a result, we can estimate the sequence of the user's absolute locations from the aligned trajectory. Various field experiments extensively verify the proposed algorithm's effectiveness that the average positioning error is approximately 0.369 (m) and 1.705 (m) in LOS and NLOS environments, respectively. △ Less

Submitted 31 March, 2021; v1 submitted 7 November, 2020; originally announced November 2020.

arXiv:2009.09827 [pdf]

doi 10.1148/ryai.200231

Radiologist-level Performance by Using Deep Learning for Segmentation of Breast Cancers on MRI Scans

Authors: Lukas Hirsch, Yu Huang, Shaojun Luo, Carolina Rossi Saccarelli, Roberto Lo Gullo, Isaac Daimiel Naranjo, Almir G. V. Bitencourt, Natsuko Onishi, Eun Sook Ko, Doris Leithner, Daly Avendano, Sarah Eskreis-Winkler, Mary Hughes, Danny F. Martinez, Katja Pinker, Krishna Juluru, Amin E. El-Rowmeim, Pierre Elnajjar, Elizabeth A. Morris, Hernan A. Makse, Lucas C Parra, Elizabeth J. Sutton

Abstract: Purpose: To develop a deep network architecture that would achieve fully automated radiologist-level segmentation of cancers at breast MRI. Materials and Methods: In this retrospective study, 38229 examinations (composed of 64063 individual breast scans from 14475 patients) were performed in female patients (age range, 12-94 years; mean age, 52 years +/- 10 [standard deviation]) who presented betw… ▽ More Purpose: To develop a deep network architecture that would achieve fully automated radiologist-level segmentation of cancers at breast MRI. Materials and Methods: In this retrospective study, 38229 examinations (composed of 64063 individual breast scans from 14475 patients) were performed in female patients (age range, 12-94 years; mean age, 52 years +/- 10 [standard deviation]) who presented between 2002 and 2014 at a single clinical site. A total of 2555 breast cancers were selected that had been segmented on two-dimensional (2D) images by radiologists, as well as 60108 benign breasts that served as examples of noncancerous tissue; all these were used for model training. For testing, an additional 250 breast cancers were segmented independently on 2D images by four radiologists. Authors selected among several three-dimensional (3D) deep convolutional neural network architectures, input modalities, and harmonization methods. The outcome measure was the Dice score for 2D segmentation, which was compared between the network and radiologists by using the Wilcoxon signed rank test and the two one-sided test procedure. Results: The highest-performing network on the training set was a 3D U-Net with dynamic contrast-enhanced MRI as input and with intensity normalized for each examination. In the test set, the median Dice score of this network was 0.77 (interquartile range, 0.26). The performance of the network was equivalent to that of the radiologists (two one-sided test procedures with radiologist performance of 0.69-0.84 as equivalence bounds, P <= .001 for both; n = 250). Conclusion: When trained on a sufficiently large dataset, the developed 3D U-Net performed as well as fellowship-trained radiologists in detailed 2D segmentation of breast cancers at routine clinical MRI. △ Less

Submitted 12 April, 2022; v1 submitted 21 September, 2020; originally announced September 2020.

arXiv:2009.01871 [pdf, other]

doi 10.1007/978-3-030-60548-3_18

Federated Learning for Breast Density Classification: A Real-World Implementation

Authors: Holger R. Roth, Ken Chang, Praveer Singh, Nir Neumark, Wenqi Li, Vikash Gupta, Sharut Gupta, Liangqiong Qu, Alvin Ihsani, Bernardo C. Bizzo, Yuhong Wen, Varun Buch, Meesam Shah, Felipe Kitamura, Matheus Mendonça, Vitor Lavor, Ahmed Harouni, Colin Compas, Jesse Tetreault, Prerna Dogra, Yan Cheng, Selnur Erdal, Richard White, Behrooz Hashemian, Thomas Schultz , et al. (18 additional authors not shown)

Abstract: Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Report… ▽ More Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Reporting & Data System (BI-RADS). We show that despite substantial differences among the datasets from all sites (mammography system, class distribution, and data set size) and without centralizing data, we can successfully train AI models in federation. The results show that models trained using FL perform 6.3% on average better than their counterparts trained on an institute's local data alone. Furthermore, we show a 45.8% relative improvement in the models' generalizability when evaluated on the other participating sites' testing data. △ Less

Submitted 20 October, 2020; v1 submitted 3 September, 2020; originally announced September 2020.

Comments: Accepted at the 1st MICCAI Workshop on "Distributed And Collaborative Learning"; add citation to Fig. 1 & 2 and update Fig. 5; fix typo in affiliations

Journal ref: In: Albarqouni S. et al. (eds) Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning. DART 2020, DCL 2020. Lecture Notes in Computer Science, vol 12444. Springer, Cham

arXiv:2008.08906 [pdf, other]

Cooperative Multi-Point Vehicular Positioning Using Millimeter-Wave Surface Reflection (Extended version)

Authors: Zezhong Zhang, Seung-Woo Ko, Rui Wang, Kaibin Huang

Abstract: Multi-point vehicular positioning is one essential operation for autonomous vehicles. However, the state-of-the-art positioning technologies, relying on reflected signals from a target (i.e., RADAR and LIDAR), cannot work without line-of-sight. Besides, it takes significant time for environment scanning and object recognition with potential detection inaccuracy, especially in complex urban situati… ▽ More Multi-point vehicular positioning is one essential operation for autonomous vehicles. However, the state-of-the-art positioning technologies, relying on reflected signals from a target (i.e., RADAR and LIDAR), cannot work without line-of-sight. Besides, it takes significant time for environment scanning and object recognition with potential detection inaccuracy, especially in complex urban situations. Some recent fatal accidents involving autonomous vehicles further expose such limitations. In this paper, we aim at overcoming these limitations by proposing a novel relative positioning approach, called Cooperative Multi-point Positioning (COMPOP). The COMPOP establishes cooperation between a target vehicle (TV) and a sensing vehicle (SV) if a LoS path exists, where a TV explicitly lets an SV to know the TV's existence by transmitting positioning waveforms. This cooperation makes it possible to remove the time-consuming scanning and target recognizing processes, facilitating real-time positioning. One prerequisite for the cooperation is a clock synchronization between a pair of TV and SV. To this end, we use a phase-differential-of-arrival based approach to remove the TV-SV clock difference from the received signal. With clock difference correction, the TV's position can be obtained via peak detection over a 3D power spectrum constructed by a Fourier transform (FT) based algorithm. The COMPOP also incorporates nearby vehicles, without knowing their locations, into the above cooperation for the case without a LoS path. The effectiveness of the COMPOP is verified by several simulations concerning practical channel parameters. △ Less

Submitted 21 August, 2020; v1 submitted 20 August, 2020; originally announced August 2020.

Comments: 34 pages, 13 figures

arXiv:2006.13807 [pdf, other]

COVID-CXNet: Detecting COVID-19 in Frontal Chest X-ray Images using Deep Learning

Authors: Arman Haghanifar, Mahdiyar Molahasani Majdabadi, Younhee Choi, S. Deivalakshmi, Seokbum Ko

Abstract: One of the primary clinical observations for screening the infectious by the novel coronavirus is capturing a chest x-ray image. In most of the patients, a chest x-ray contains abnormalities, such as consolidation, which are the results of COVID-19 viral pneumonia. In this study, research is conducted on efficiently detecting imaging features of this type of pneumonia using deep convolutional neur… ▽ More One of the primary clinical observations for screening the infectious by the novel coronavirus is capturing a chest x-ray image. In most of the patients, a chest x-ray contains abnormalities, such as consolidation, which are the results of COVID-19 viral pneumonia. In this study, research is conducted on efficiently detecting imaging features of this type of pneumonia using deep convolutional neural networks in a large dataset. It is demonstrated that simple models, alongside the majority of pretrained networks in the literature, focus on irrelevant features for decision-making. In this paper, numerous chest x-ray images from various sources are collected, and the largest publicly accessible dataset is prepared. Finally, using the transfer learning paradigm, the well-known CheXNet model is utilized for develo** COVID-CXNet. This powerful model is capable of detecting the novel coronavirus pneumonia based on relevant and meaningful features with precise localization. COVID-CXNet is a step towards a fully automated and robust COVID-19 detection system. △ Less

Submitted 28 July, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: The editor has asked for confidence intervals. In this version, confidence intervals are added to the manuscript

arXiv:2003.13868 [pdf, other]

Lesion Conditional Image Generation for Improved Segmentation of Intracranial Hemorrhage from CT Images

Authors: Manohar Karki, Junghwan Cho, Seokhwan Ko

Abstract: Data augmentation can effectively resolve a scarcity of images when training machine-learning algorithms. It can make them more robust to unseen images. We present a lesion conditional Generative Adversarial Network LcGAN to generate synthetic Computed Tomography (CT) images for data augmentation. A lesion conditional image (segmented mask) is an input to both the generator and the discriminator o… ▽ More Data augmentation can effectively resolve a scarcity of images when training machine-learning algorithms. It can make them more robust to unseen images. We present a lesion conditional Generative Adversarial Network LcGAN to generate synthetic Computed Tomography (CT) images for data augmentation. A lesion conditional image (segmented mask) is an input to both the generator and the discriminator of the LcGAN during training. The trained model generates contextual CT images based on input masks. We quantify the quality of the images by using a fully convolutional network (FCN) score and blurriness. We also train another classification network to select better synthetic images. These synthetic CT images are then augmented to our hemorrhagic lesion segmentation network. By applying this augmentation method on 2.5%, 10% and 25% of original data, segmentation improved by 12.8%, 6% and 1.6% respectively. △ Less

Submitted 30 March, 2020; originally announced March 2020.

arXiv:2002.02562 [pdf, other]

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

Authors: Qian Zhang, Han Lu, Hasim Sak, Anshuman Tripathi, Erik McDermott, Stephen Koo, Shankar Kumar

Abstract: In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system. Transformer computation blocks based on self-attention are used to encode both audio and label sequences independently. The activations from both audio and label encoders are combined with a feed-forward layer to compute a probability distribution ove… ▽ More In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system. Transformer computation blocks based on self-attention are used to encode both audio and label sequences independently. The activations from both audio and label encoders are combined with a feed-forward layer to compute a probability distribution over the label space for every combination of acoustic frame position and label history. This is similar to the Recurrent Neural Network Transducer (RNN-T) model, which uses RNNs for information encoding instead of Transformer encoders. The model is trained with the RNN-T loss well-suited to streaming decoding. We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy. We also show that the full attention version of our model beats the-state-of-the art accuracy on the LibriSpeech benchmarks. Our results also show that we can bridge the gap between full attention and limited attention versions of our model by attending to a limited number of future frames. △ Less

Submitted 14 February, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

Comments: This is the final version of the paper submitted to the ICASSP 2020 on Oct 21, 2019

arXiv:1911.13181 [pdf, other]

doi 10.1145/3340531.3411940

ST-GRAT: A Novel Spatio-temporal Graph Attention Network for Accurately Forecasting Dynamically Changing Road Speed

Authors: Cheonbok Park, Chunggi Lee, Hyo** Bahng, Yunwon Tae, Kihwan Kim, Seungmin **, Sungahn Ko, Jaegul Choo

Abstract: Predicting road traffic speed is a challenging task due to different types of roads, abrupt speed change and spatial dependencies between roads; it requires the modeling of dynamically changing spatial dependencies among roads and temporal patterns over long input sequences. This paper proposes a novel spatio-temporal graph attention (ST-GRAT) that effectively captures the spatio-temporal dynamics… ▽ More Predicting road traffic speed is a challenging task due to different types of roads, abrupt speed change and spatial dependencies between roads; it requires the modeling of dynamically changing spatial dependencies among roads and temporal patterns over long input sequences. This paper proposes a novel spatio-temporal graph attention (ST-GRAT) that effectively captures the spatio-temporal dynamics in road networks. The novel aspects of our approach mainly include spatial attention, temporal attention, and spatial sentinel vectors. The spatial attention takes the graph structure information (e.g., distance between roads) and dynamically adjusts spatial correlation based on road states. The temporal attention is responsible for capturing traffic speed changes, and the sentinel vectors allow the model to retrieve new features from spatially correlated nodes or preserve existing features. The experimental results show that ST-GRAT outperforms existing models, especially in difficult conditions where traffic speeds rapidly change (e.g., rush hours). We additionally provide a qualitative study to analyze when and where ST-GRAT tended to make accurate predictions during rush-hour times. △ Less

Submitted 20 October, 2020; v1 submitted 29 November, 2019; originally announced November 2019.

Comments: to be published in CIKM-2020

arXiv:1911.08656 [pdf, other]

doi 10.1109/ICCVW.2019.00448

W-Net: Two-stage U-Net with misaligned data for raw-to-RGB map**

Authors: Kwang-Hyun Uhm, Seung-Wook Kim, Seo-Won Ji, Sung-** Cho, Jun-Pyo Hong, Sung-Jea Ko

Abstract: Recent research on learning a map** between raw Bayer images and RGB images has progressed with the development of deep convolutional neural networks. A challenging data set namely the Zurich Raw-to-RGB data set (ZRR) has been released in the AIM 2019 raw-to-RGB map** challenge. In ZRR, input raw and target RGB images are captured by two different cameras and thus not perfectly aligned. Moreov… ▽ More Recent research on learning a map** between raw Bayer images and RGB images has progressed with the development of deep convolutional neural networks. A challenging data set namely the Zurich Raw-to-RGB data set (ZRR) has been released in the AIM 2019 raw-to-RGB map** challenge. In ZRR, input raw and target RGB images are captured by two different cameras and thus not perfectly aligned. Moreover, camera metadata such as white balance gains and color correction matrix are not provided, which makes the challenge more difficult. In this paper, we explore an effective network structure and a loss function to address these issues. We exploit a two-stage U-Net architecture and also introduce a loss function that is less variant to alignment and more sensitive to color differences. In addition, we show an ensemble of networks trained with different loss functions can bring a significant performance gain. We demonstrate the superiority of our method by achieving the highest score in terms of both the peak signal-to-noise ratio and the structural similarity and obtaining the second-best mean-opinion-score in the challenge. △ Less

Submitted 21 November, 2019; v1 submitted 19 November, 2019; originally announced November 2019.

Comments: Accepted by ICCVW 2019

arXiv:1911.07424 [pdf, other]

doi 10.1109/ACCESS.2020.3001637

Fast and Accurate 3D Hand Pose Estimation via Recurrent Neural Network for Capturing Hand Articulations

Authors: Cheol-hwan Yoo, Seo-won Ji, Yong-goo Shin, Seung-wook Kim, Sung-jea Ko

Abstract: 3D hand pose estimation from a single depth image plays an important role in computer vision and human-computer interaction. Although recent hand pose estimation methods using convolution neural network (CNN) have shown notable improvements in accuracy, most of them have a limitation that they rely on a complex network structure without fully exploiting the articulated structure of the hand. A han… ▽ More 3D hand pose estimation from a single depth image plays an important role in computer vision and human-computer interaction. Although recent hand pose estimation methods using convolution neural network (CNN) have shown notable improvements in accuracy, most of them have a limitation that they rely on a complex network structure without fully exploiting the articulated structure of the hand. A hand, which is an articulated object, is composed of six local parts: the palm and five independent fingers. Each finger consists of sequential-joints that provide constrained motion, referred to as a kinematic chain. In this paper, we propose a hierarchically-structured convolutional recurrent neural network (HCRNN) with six branches that estimate the 3D position of the palm and five fingers independently. The palm position is predicted via fully-connected layers. Each sequential-joint, i.e. finger position, is obtained using a recurrent neural network (RNN) to capture the spatial dependencies between adjacent joints. Then the output features of the palm and finger branches are concatenated to estimate the global hand position. HCRNN directly takes the depth map as an input without a time-consuming data conversion, such as 3D voxels and point clouds. Experimental results on public datasets demonstrate that the proposed HCRNN not only outperforms most 2D CNN-based methods using the depth image as their inputs but also achieves competitive results with state-of-the-art 3D CNN-based methods with a highly efficient running speed of 285 fps on a single GPU. △ Less

Submitted 18 March, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

Journal ref: IEEE Access. 8 (2020) 114010-114019

arXiv:1911.03461 [pdf, other]

AIM 2019 Challenge on Image Demoireing: Methods and Results

Authors: Shanxin Yuan, Radu Timofte, Gregory Slabaugh, Ales Leonardis, Bolun Zheng, Xin Ye, Xiang Tian, Yaowu Chen, Xi Cheng, Zhenyong Fu, Jian Yang, Ming Hong, Wenying Lin, Wen** Yang, Yanyun Qu, Hong-Kyu Shin, Joon-Yeon Kim, Sung-Jea Ko, Hang Dong, Yu Guo, Jie Wang, Xuan Ding, Zongyan Han, Sourya Dipta Das, Kuldeep Purohit , et al. (3 additional authors not shown)

Abstract: This paper reviews the first-ever image demoireing challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ICCV 2019. This paper describes the challenge, and focuses on the proposed solutions and their results. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. A new dataset, called LCDMoire wa… ▽ More This paper reviews the first-ever image demoireing challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ICCV 2019. This paper describes the challenge, and focuses on the proposed solutions and their results. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. A new dataset, called LCDMoire was created for this challenge, and consists of 10,200 synthetically generated image pairs (moire and clean ground truth). The challenge was divided into 2 tracks. Track 1 targeted fidelity, measuring the ability of demoire methods to obtain a moire-free image compared with the ground truth, while Track 2 examined the perceptual quality of demoire methods. The tracks had 60 and 39 registered participants, respectively. A total of eight teams competed in the final testing phase. The entries span the current the state-of-the-art in the image demoireing problem. △ Less

Submitted 8 November, 2019; originally announced November 2019.

Comments: arXiv admin note: text overlap with arXiv:1911.02498

arXiv:1906.00709 [pdf, other]

cGANs with Conditional Convolution Layer

Authors: Min-Cheol Sagong, Yong-Goo Shin, Yoon-Jae Yeo, Seung Park, Sung-Jea Ko

Abstract: Conditional generative adversarial networks (cGANs) have been widely researched to generate class conditional images using a single generator. However, in the conventional cGANs techniques, it is still challenging for the generator to learn condition-specific features, since a standard convolutional layer with the same weights is used regardless of the condition. In this paper, we propose a novel… ▽ More Conditional generative adversarial networks (cGANs) have been widely researched to generate class conditional images using a single generator. However, in the conventional cGANs techniques, it is still challenging for the generator to learn condition-specific features, since a standard convolutional layer with the same weights is used regardless of the condition. In this paper, we propose a novel convolution layer, called the conditional convolution layer, which directly generates different feature maps by employing the weights which are adjusted depending on the conditions. More specifically, in each conditional convolution layer, the weights are conditioned in a simple but effective way through filter-wise scaling and channel-wise shifting operations. In contrast to the conventional methods, the proposed method with a single generator can effectively handle condition-specific characteristics. The experimental results on CIFAR, LSUN and ImageNet datasets show that the generator with the proposed conditional convolution layer achieves a higher quality of conditional image generation than that with the standard convolution layer. △ Less

Submitted 8 April, 2020; v1 submitted 3 June, 2019; originally announced June 2019.

Comments: Submitted to IEEE Trans. Neural Networks and Learning Systems (TNNLS)

arXiv:1905.09010 [pdf, other]

doi 10.1109/TNNLS.2020.2978501

PEPSI++: Fast and Lightweight Network for Image Inpainting

Authors: Yong-Goo Shin, Min-Cheol Sagong, Yoon-Jae Yeo, Seung-Wook Kim, Sung-Jea Ko

Abstract: Among the various generative adversarial network (GAN)-based image inpainting methods, a coarse-to-fine network with a contextual attention module (CAM) has shown remarkable performance. However, owing to two stacked generative networks, the coarse-to-fine network needs numerous computational resources such as convolution operations and network parameters, which result in low speed. To address thi… ▽ More Among the various generative adversarial network (GAN)-based image inpainting methods, a coarse-to-fine network with a contextual attention module (CAM) has shown remarkable performance. However, owing to two stacked generative networks, the coarse-to-fine network needs numerous computational resources such as convolution operations and network parameters, which result in low speed. To address this problem, we propose a novel network architecture called PEPSI: parallel extended-decoder path for semantic inpainting network, which aims at reducing the hardware costs and improving the inpainting performance. PEPSI consists of a single shared encoding network and parallel decoding networks called coarse and inpainting paths. The coarse path produces a preliminary inpainting result to train the encoding network for the prediction of features for the CAM. Simultaneously, the inpainting path generates higher inpainting quality using the refined features reconstructed via the CAM. In addition, we propose Diet-PEPSI that significantly reduces the network parameters while maintaining the performance. In Diet-PEPSI, to capture the global contextual information with low hardware costs, we propose novel rate-adaptive dilated convolutional layers, which employ the common weights but produce dynamic features depending on the given dilation rates. Extensive experiments comparing the performance with state-of-the-art image inpainting methods demonstrate that both PEPSI and Diet-PEPSI improve the qualitative scores, i.e. the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), as well as significantly reduce hardware costs such as computational time and the number of network parameters. △ Less

Submitted 6 March, 2020; v1 submitted 22 May, 2019; originally announced May 2019.

Comments: Accepted to IEEE transactions on Neural Networks and Learning Systems. To be published

arXiv:1905.05916 [pdf]

doi 10.1109/TIP.2019.2953352

Unsupervised Deep Contrast Enhancement with Power Constraint for OLED Displays

Authors: Yong-Goo Shin, Seung Park, Yoon-Jae Yeo, Min-Jae Yoo, Sung-Jea Ko

Abstract: Various power-constrained contrast enhancement (PCCE) techniques have been applied to an organic light emitting diode (OLED) display for reducing the power demands of the display while preserving the image quality. In this paper, we propose a new deep learning-based PCCE scheme that constrains the power consumption of the OLED displays while enhancing the contrast of the displayed image. In the pr… ▽ More Various power-constrained contrast enhancement (PCCE) techniques have been applied to an organic light emitting diode (OLED) display for reducing the power demands of the display while preserving the image quality. In this paper, we propose a new deep learning-based PCCE scheme that constrains the power consumption of the OLED displays while enhancing the contrast of the displayed image. In the proposed method, the power consumption is constrained by simply reducing the brightness a certain ratio, whereas the perceived visual quality is preserved as much as possible by enhancing the contrast of the image using a convolutional neural network (CNN). Furthermore, our CNN can learn the PCCE technique without a reference image by unsupervised learning. Experimental results show that the proposed method is superior to conventional ones in terms of image quality assessment metrics such as a visual saliency-induced index (VSI) and a measure of enhancement (EME). △ Less

Submitted 9 December, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

Comments: Accepted to IEEE transactions on Image Processing. To be published

arXiv:1904.13010 [pdf, other]

Realizing Multi-Point Vehicular Positioning via Millimeter-wave Transmission

Authors: Zezhong Zhang, Seung-Woo Ko, Rui Wang, Kaibin Huang

Abstract: Multi-point detection of the full-scale environment is an important issue in autonomous driving. The state-of-the-art positioning technologies (such as RADAR and LIDAR) are incapable of real-time detection without line-of-sight. To address this issue, this paper presents a novel multi-point vehicular positioning technology via \emph{millimeter-wave} (mmWave) transmission that exploits multi-path r… ▽ More Multi-point detection of the full-scale environment is an important issue in autonomous driving. The state-of-the-art positioning technologies (such as RADAR and LIDAR) are incapable of real-time detection without line-of-sight. To address this issue, this paper presents a novel multi-point vehicular positioning technology via \emph{millimeter-wave} (mmWave) transmission that exploits multi-path reflection from a \emph{target vehicle} (TV) to a \emph{sensing vehicle} (SV), which enables the SV to fast capture both the shape and location information of the TV in \emph{non-line-of-sight} (NLoS) under the assistance of multi-path reflections. A \emph{phase-difference-of-arrival} (PDoA) based hyperbolic positioning algorithm is designed to achieve the synchronization between the TV and SV. The \emph{stepped-frequency-continuous-wave} (SFCW) is utilized as signals for multi-point detection of the TVs. Transceiver separation enables our approach to work in NLoS conditions and achieve much lower latency compared with conventional positioning techniques. △ Less

Submitted 29 April, 2019; originally announced April 2019.

Comments: 9 pages, 6 figures, conference version

arXiv:1804.03541 [pdf, other]

Sensing Hidden Vehicles by Exploiting Multi-Path V2V Transmission

Authors: Kaifeng Han, Seung-Woo Ko, Hyuk** Chae, Byoung-Hoon Kim, Kaibin Huang

Abstract: This paper presents a technology of sensing hidden vehicles by exploiting multi-path vehicle-to-vehicle (V2V) communication. This overcomes the limitation of existing RADAR technologies that requires line-of-sight (LoS), thereby enabling more intelligent manoeuvre in autonomous driving and improving its safety. The proposed technology relies on transmission of orthogonal waveforms over different a… ▽ More This paper presents a technology of sensing hidden vehicles by exploiting multi-path vehicle-to-vehicle (V2V) communication. This overcomes the limitation of existing RADAR technologies that requires line-of-sight (LoS), thereby enabling more intelligent manoeuvre in autonomous driving and improving its safety. The proposed technology relies on transmission of orthogonal waveforms over different antennas at the target (hidden) vehicle. Even without LoS, the resultant received signal enables the sensing vehicle to detect the position, shape, and driving direction of the hidden vehicle by jointly analyzing the geometry (AoA/AoD/propagation distance) of individual propagation path. The accuracy of the proposed technique is validated by realistic simulation including both highway and rural scenarios. △ Less

Submitted 10 April, 2018; originally announced April 2018.

Comments: 5 pages, 5 figures

Showing 1–25 of 25 results for author: Ko, S