-
Assessment of Clonal Hematopoiesis of Indeterminate Potential from Cardiac Magnetic Resonance Imaging using Deep Learning in a Cardio-oncology Population
Authors:
Sangeon Ryu,
Shawn Ahn,
Jeacy Espinoza,
Alokkumar Jha,
Stephanie Halene,
James S. Duncan,
Jennifer M Kwan,
Nicha C. Dvornek
Abstract:
Background: We propose a novel method to identify who may likely have clonal hematopoiesis of indeterminate potential (CHIP), a condition characterized by the presence of somatic mutations in hematopoietic stem cells without detectable hematologic malignancy, using deep learning techniques. Methods: We developed a convolutional neural network (CNN) to predict CHIP status using 4 different views fr…
▽ More
Background: We propose a novel method to identify who may likely have clonal hematopoiesis of indeterminate potential (CHIP), a condition characterized by the presence of somatic mutations in hematopoietic stem cells without detectable hematologic malignancy, using deep learning techniques. Methods: We developed a convolutional neural network (CNN) to predict CHIP status using 4 different views from standard delayed gadolinium-enhanced cardiac magnetic resonance imaging (CMR). We used 5-fold cross validation on 82 cardio-oncology patients to assess the performance of our model. Different algorithms were compared to find the optimal patient-level prediction method using the image-level CNN predictions. Results: We found that the best model had an area under the receiver operating characteristic curve of 0.85 and an accuracy of 82%. Conclusions: We conclude that a deep learning-based diagnostic approach for CHIP using CMR is promising.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
HILCodec: High Fidelity and Lightweight Neural Audio Codec
Authors:
Sunghwan Ahn,
Beom Jun Woo,
Min Hyun Han,
Chanyeong Moon,
Nam Soo Kim
Abstract:
The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of Wave-U-Net does not increase consist…
▽ More
The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of Wave-U-Net does not increase consistently as the network depth increases. We analyze the root cause of such a phenomenon and suggest a variance-constrained design. Also, we reveal various distortions in previous waveform domain discriminators and propose a novel distortion-free discriminator. The resulting model, \textit{HILCodec}, is a real-time streaming audio codec that demonstrates state-of-the-art quality across various bitrates and audio types.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Human-Machine Interaction in Automated Vehicles: Reducing Voluntary Driver Intervention
Authors:
Xinzhi Zhong,
Yang Zhou,
Varshini Kamaraj,
Zhenhao Zhou,
Wissam Kontar,
Dan Negrut,
John D. Lee,
Soyoung Ahn
Abstract:
This paper develops a novel car-following control method to reduce voluntary driver interventions and improve traffic stability in Automated Vehicles (AVs). Through a combination of experimental and empirical analysis, we show how voluntary driver interventions can instigate substantial traffic disturbances that are amplified along the traffic upstream. Motivated by these findings, we present a fr…
▽ More
This paper develops a novel car-following control method to reduce voluntary driver interventions and improve traffic stability in Automated Vehicles (AVs). Through a combination of experimental and empirical analysis, we show how voluntary driver interventions can instigate substantial traffic disturbances that are amplified along the traffic upstream. Motivated by these findings, we present a framework for driver intervention based on evidence accumulation (EA), which describes the evolution of the driver's distrust in automation, ultimately resulting in intervention. Informed through the EA framework, we propose a deep reinforcement learning (DRL)-based car-following control for AVs that is strategically designed to mitigate unnecessary driver intervention and improve traffic stability. Numerical experiments are conducted to demonstrate the effectiveness of the proposed control model.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Dynamic and Super-Personalized Media Ecosystem Driven by Generative AI: Unpredictable Plays Never Repeating The Same
Authors:
Sungjun Ahn,
Hyun-Jeong Yim,
Youngwan Lee,
Sung-Ik Park
Abstract:
This paper introduces a media service model that exploits artificial intelligence (AI) video generators at the receive end. This proposal deviates from the traditional multimedia ecosystem, completely relying on in-house production, by shifting part of the content creation onto the receiver. We bring a semantic process into the framework, allowing the distribution network to provide service elemen…
▽ More
This paper introduces a media service model that exploits artificial intelligence (AI) video generators at the receive end. This proposal deviates from the traditional multimedia ecosystem, completely relying on in-house production, by shifting part of the content creation onto the receiver. We bring a semantic process into the framework, allowing the distribution network to provide service elements that prompt the content generator, rather than distributing encoded data of fully finished programs. The service elements include fine-tailored text descriptions, lightweight image data of some objects, or application programming interfaces, comprehensively referred to as semantic sources, and the user terminal translates the received semantic data into video frames. Empowered by the random nature of generative AI, the users could then experience super-personalized services accordingly. The proposed idea incorporates the situations in which the user receives different service providers' element packages; a sequence of packages over time, or multiple packages at the same time. Given promised in-context coherence and content integrity, the combinatory dynamics will amplify the service diversity, allowing the users to always chance upon new experiences. This work particularly aims at short-form videos and advertisements, which the users would easily feel fatigued by seeing the same frame sequence every time. In those use cases, the content provider's role will be recast as scripting semantic sources, transformed from a thorough producer. Overall, this work explores a new form of media ecosystem facilitated by receiver-embedded generative models, featuring both random content dynamics and enhanced delivery efficiency simultaneously.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Hybrid Neural Representations for Spherical Data
Authors:
Hyomin Kim,
Yunhui Jang,
Jaeho Lee,
Sungsoo Ahn
Abstract:
In this paper, we study hybrid neural representations for spherical data, a domain of increasing relevance in scientific research. In particular, our work focuses on weather and climate data as well as comic microwave background (CMB) data. Although previous studies have delved into coordinate-based neural representations for spherical signals, they often fail to capture the intricate details of h…
▽ More
In this paper, we study hybrid neural representations for spherical data, a domain of increasing relevance in scientific research. In particular, our work focuses on weather and climate data as well as comic microwave background (CMB) data. Although previous studies have delved into coordinate-based neural representations for spherical signals, they often fail to capture the intricate details of highly nonlinear signals. To address this limitation, we introduce a novel approach named Hybrid Neural Representations for Spherical data (HNeR-S). Our main idea is to use spherical feature-grids to obtain positional features which are combined with a multilayer perception to predict the target signal. We consider feature-grids with equirectangular and hierarchical equal area isolatitude pixelization structures that align with weather data and CMB data, respectively. We extensively verify the effectiveness of our HNeR-S for regression, super-resolution, temporal interpolation, and compression tasks.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Heteroscedastic Uncertainty Estimation for Probabilistic Unsupervised Registration of Noisy Medical Images
Authors:
Xiaoran Zhang,
Daniel H. Pak,
Shawn S. Ahn,
Xiaoxiao Li,
Chenyu You,
Lawrence Staib,
Albert J. Sinusas,
Alex Wong,
James S. Duncan
Abstract:
This paper proposes a heteroscedastic uncertainty estimation framework for unsupervised medical image registration. Existing methods rely on objectives (e.g. mean-squared error) that assume a uniform noise level across the image, disregarding the heteroscedastic and input-dependent characteristics of noise distribution in real-world medical images. This further introduces noisy gradients due to un…
▽ More
This paper proposes a heteroscedastic uncertainty estimation framework for unsupervised medical image registration. Existing methods rely on objectives (e.g. mean-squared error) that assume a uniform noise level across the image, disregarding the heteroscedastic and input-dependent characteristics of noise distribution in real-world medical images. This further introduces noisy gradients due to undesired penalization on outliers, causing unnatural deformation and performance degradation. To mitigate this, we propose an adaptive weighting scheme with a relative $γ$-exponentiated signal-to-noise ratio (SNR) for the displacement estimator after modeling the heteroscedastic noise using a separate variance estimator to prevent the model from being driven away by spurious gradients from error residuals, leading to more accurate displacement estimation. To illustrate the versatility and effectiveness of the proposed method, we tested our framework on two representative registration architectures across three medical image datasets. Our proposed framework consistently outperforms other baselines both quantitatively and qualitatively while also providing accurate and sensible uncertainty measures. Paired t-tests show that our improvements in registration accuracy are statistically significant. The code will be publicly available at \url{https://voldemort108x.github.io/hetero_uncertainty/}.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
Learning Driver Models for Automated Vehicles via Knowledge Sharing and Personalization
Authors:
Wissam Kontar,
Xinzhi Zhong,
Soyoung Ahn
Abstract:
This paper describes a framework for learning Automated Vehicles (AVs) driver models via knowledge sharing between vehicles and personalization. The innate variability in the transportation system makes it exceptionally challenging to expose AVs to all possible driving scenarios during empirical experimentation or testing. Consequently, AVs could be blind to certain encounters that are deemed detr…
▽ More
This paper describes a framework for learning Automated Vehicles (AVs) driver models via knowledge sharing between vehicles and personalization. The innate variability in the transportation system makes it exceptionally challenging to expose AVs to all possible driving scenarios during empirical experimentation or testing. Consequently, AVs could be blind to certain encounters that are deemed detrimental to their safe and efficient operation. It is then critical to share knowledge across AVs that increase exposure to driving scenarios occurring in the real world. This paper explores a method to collaboratively train a driver model by sharing knowledge and borrowing strength across vehicles while retaining a personalized model tailored to the vehicle's unique conditions and properties. Our model brings a federated learning approach to collaborate between multiple vehicles while circumventing the need to share raw data between them. We showcase our method's performance in experimental simulations. Such an approach to learning finds several applications across transportation engineering including intelligent transportation systems, traffic management, and vehicle-to-vehicle communication. Code and sample dataset are made available at the project page https://github.com/wissamkontar.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
On the Spatial-Wideband Effects in Millimeter-Wave Cell-Free Massive MIMO
Authors:
Seyoung Ahn,
Soohyeong Kim,
Yongseok Kwon,
Joohan Park,
Jiseung Youn,
Sunghyun Cho
Abstract:
In this paper, we investigate the spatial-wideband effects in cell-free massive MIMO (CF-mMIMO) systems in mmWave bands. The utilization of mmWave frequencies brings challenges such as signal attenuation and the need for denser networks like ultra-dense networks (UDN) to maintain communication performance. CF-mMIMO is introduced as a solution, where distributed access points (APs) transmit signals…
▽ More
In this paper, we investigate the spatial-wideband effects in cell-free massive MIMO (CF-mMIMO) systems in mmWave bands. The utilization of mmWave frequencies brings challenges such as signal attenuation and the need for denser networks like ultra-dense networks (UDN) to maintain communication performance. CF-mMIMO is introduced as a solution, where distributed access points (APs) transmit signals to a central processing unit (CPU) for joint processing. CF-mMIMO offers advantages in reducing non-line-of-sight (NLOS) conditions and overcoming signal blockage. We investigate the synchronization problem in CF-mMIMO due to time delays between APs. It proposes a minimum cyclic prefix length to mitigate inter-symbol interference (ISI) in OFDM systems. Furthermore, the spatial correlations of channel responses are analyzed in the frequency-phase domain. The impact of these correlations on system performance is examined. The findings contribute to improving the performance of CF-mMIMO systems and enhancing the effective utilization of mmWave communication.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
EM-Network: Oracle Guided Self-distillation for Sequence Learning
Authors:
Ji Won Yoon,
Sunghwan Ahn,
Hyeonseung Lee,
Minchan Kim,
Seok Min Kim,
Nam Soo Kim
Abstract:
We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning. In contrast to conventional methods, it is trained with oracle guidance, which is derived from the target sequence. Since the oracle guidance compactly represents the target-side context that can assist the sequence model in solving the t…
▽ More
We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning. In contrast to conventional methods, it is trained with oracle guidance, which is derived from the target sequence. Since the oracle guidance compactly represents the target-side context that can assist the sequence model in solving the task, the EM-Network achieves a better prediction compared to using only the source input. To allow the sequence model to inherit the promising capability of the EM-Network, we propose a new self-distillation strategy, where the original sequence model can benefit from the knowledge of the EM-Network in a one-stage manner. We conduct comprehensive experiments on two types of seq2seq models: connectionist temporal classification (CTC) for speech recognition and attention-based encoder-decoder (AED) for machine translation. Experimental results demonstrate that the EM-Network significantly advances the current state-of-the-art approaches, improving over the best prior work on speech recognition and establishing state-of-the-art performance on WMT'14 and IWSLT'14.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Restoring Original Signal From Pile-up Signal using Deep Learning
Authors:
C. H. Kim,
S. Ahn,
K. Y. Chae,
J. Hooker,
G. V. Rogachev
Abstract:
Pile-up signals are frequently produced in experimental physics. They create inaccurate physics data with high uncertainty and cause various problems. Therefore, the correction to pile-up signals is crucially required. In this study, we implemented a deep learning method to restore the original signals from the pile-up signals. We showed that a deep learning model could accurately reconstruct the…
▽ More
Pile-up signals are frequently produced in experimental physics. They create inaccurate physics data with high uncertainty and cause various problems. Therefore, the correction to pile-up signals is crucially required. In this study, we implemented a deep learning method to restore the original signals from the pile-up signals. We showed that a deep learning model could accurately reconstruct the original signal waveforms from the pile-up waveforms. By substituting the pile-up signals with the original signals predicted by the model, the energy and timing resolutions of the data are notably enhanced. The model implementation significantly improved the quality of the particle identification plot and particle tracks. This method is applicable to similar problems, such as separating multiple signals or correcting pile-up signals with other types of noises and backgrounds.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Multimodal Speech Recognition for Language-Guided Embodied Agents
Authors:
Allen Chang,
Xiaoyuan Zhu,
Aarav Monga,
Seoho Ahn,
Tejas Srinivasan,
Jesse Thomason
Abstract:
Benchmarks for language-guided embodied agents typically assume text-based instructions, but deployed agents will encounter spoken instructions. While Automatic Speech Recognition (ASR) models can bridge the input gap, erroneous ASR transcripts can hurt the agents' ability to complete tasks. In this work, we propose training a multimodal ASR model to reduce errors in transcribing spoken instructio…
▽ More
Benchmarks for language-guided embodied agents typically assume text-based instructions, but deployed agents will encounter spoken instructions. While Automatic Speech Recognition (ASR) models can bridge the input gap, erroneous ASR transcripts can hurt the agents' ability to complete tasks. In this work, we propose training a multimodal ASR model to reduce errors in transcribing spoken instructions by considering the accompanying visual context. We train our model on a dataset of spoken instructions, synthesized from the ALFRED task completion dataset, where we simulate acoustic noise by systematically masking spoken words. We find that utilizing visual observations facilitates masked word recovery, with multimodal ASR models recovering up to 30% more masked words than unimodal baselines. We also find that a text-trained embodied agent successfully completes tasks more often by following transcribed instructions from multimodal ASR models. github.com/Cylumn/embodied-multimodal-asr
△ Less
Submitted 9 October, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition
Authors:
Ji Won Yoon,
Beom Jun Woo,
Sunghwan Ahn,
Hyeonseung Lee,
Nam Soo Kim
Abstract:
Recently, the advance in deep learning has brought a considerable improvement in the end-to-end speech recognition field, simplifying the traditional pipeline while producing promising results. Among the end-to-end models, the connectionist temporal classification (CTC)-based model has attracted research interest due to its non-autoregressive nature. However, such CTC models require a heavy comput…
▽ More
Recently, the advance in deep learning has brought a considerable improvement in the end-to-end speech recognition field, simplifying the traditional pipeline while producing promising results. Among the end-to-end models, the connectionist temporal classification (CTC)-based model has attracted research interest due to its non-autoregressive nature. However, such CTC models require a heavy computational cost to achieve outstanding performance. To mitigate the computational burden, we propose a simple yet effective knowledge distillation (KD) for the CTC framework, namely Inter-KD, that additionally transfers the teacher's knowledge to the intermediate CTC layers of the student network. From the experimental results on the LibriSpeech, we verify that the Inter-KD shows better achievements compared to the conventional KD methods. Without using any language model (LM) and data augmentation, Inter-KD improves the word error rate (WER) performance from 8.85 % to 6.30 % on the test-clean.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Bayesian Methods in Automated Vehicle's Car-following Uncertainties: Enabling Strategic Decision Making
Authors:
Wissam Kontar,
Soyoung Ahn
Abstract:
This paper proposes a methodology to estimate uncertainty in automated vehicle (AV) dynamics in real time via Bayesian inference. Based on the estimated uncertainty, the method aims to continuously monitor the car-following (CF) performance of the AV to support strategic actions to maintain a desired performance. Our methodology consists of three sequential components: (i) the Stochastic Gradient…
▽ More
This paper proposes a methodology to estimate uncertainty in automated vehicle (AV) dynamics in real time via Bayesian inference. Based on the estimated uncertainty, the method aims to continuously monitor the car-following (CF) performance of the AV to support strategic actions to maintain a desired performance. Our methodology consists of three sequential components: (i) the Stochastic Gradient Langevin Dynamics (SGLD) is adopted to estimate parameter uncertainty relative to vehicular dynamics in real time, (ii) dynamic monitoring of car-following stability (local and string-wise), and (iii) strategic actions for control adjustment if anomaly is detected. The proposed methodology provides means to gauge AV car-following performance in real time and preserve desired performance against real time uncertainty that are unaccounted for in the vehicle control algorithm.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Learning correspondences of cardiac motion from images using biomechanics-informed modeling
Authors:
Xiaoran Zhang,
Chenyu You,
Shawn Ahn,
Juntang Zhuang,
Lawrence Staib,
James Duncan
Abstract:
Learning spatial-temporal correspondences in cardiac motion from images is important for understanding the underlying dynamics of cardiac anatomical structures. Many methods explicitly impose smoothness constraints such as the $\mathcal{L}_2$ norm on the displacement vector field (DVF), while usually ignoring biomechanical feasibility in the transformation. Other geometric constraints either regul…
▽ More
Learning spatial-temporal correspondences in cardiac motion from images is important for understanding the underlying dynamics of cardiac anatomical structures. Many methods explicitly impose smoothness constraints such as the $\mathcal{L}_2$ norm on the displacement vector field (DVF), while usually ignoring biomechanical feasibility in the transformation. Other geometric constraints either regularize specific regions of interest such as imposing incompressibility on the myocardium or introduce additional steps such as training a separate network-based regularizer on physically simulated datasets. In this work, we propose an explicit biomechanics-informed prior as regularization on the predicted DVF in modeling a more generic biomechanically plausible transformation within all cardiac structures without introducing additional training complexity. We validate our methods on two publicly available datasets in the context of 2D MRI data and perform extensive experiments to illustrate the effectiveness and robustness of our proposed methods compared to other competing regularization schemes. Our proposed methods better preserve biomechanical properties by visual assessment and show advantages in segmentation performance using quantitative evaluation metrics. The code is publicly available at \url{https://github.com/Voldemort108X/bioinformed_reg}.
△ Less
Submitted 1 September, 2022;
originally announced September 2022.
-
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Authors:
Sarthak Pati,
Ujjwal Baid,
Brandon Edwards,
Micah Sheller,
Shih-Han Wang,
G Anthony Reina,
Patrick Foley,
Alexey Gruzdev,
Deepthi Karkada,
Christos Davatzikos,
Chiharu Sako,
Satyam Ghodasara,
Michel Bilello,
Suyash Mohan,
Philipp Vollmuth,
Gianluca Brugnara,
Chandrakanth J Preetha,
Felix Sahm,
Klaus Maier-Hein,
Maximilian Zenk,
Martin Bendszus,
Wolfgang Wick,
Evan Calabrese,
Jeffrey Rudie,
Javier Villanueva-Meyer
, et al. (254 additional authors not shown)
Abstract:
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc…
▽ More
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.
△ Less
Submitted 25 April, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Authors:
Minchan Kim,
Myeonghun Jeong,
Byoung ** Choi,
Sunghwan Ahn,
Joun Yeop Lee,
Nam Soo Kim
Abstract:
Training a text-to-speech (TTS) model requires a large scale text labeled speech corpus, which is troublesome to collect. In this paper, we propose a transfer learning framework for TTS that utilizes a large amount of unlabeled speech dataset for pre-training. By leveraging wav2vec2.0 representation, unlabeled speech can highly improve performance, especially in the lack of labeled speech. We also…
▽ More
Training a text-to-speech (TTS) model requires a large scale text labeled speech corpus, which is troublesome to collect. In this paper, we propose a transfer learning framework for TTS that utilizes a large amount of unlabeled speech dataset for pre-training. By leveraging wav2vec2.0 representation, unlabeled speech can highly improve performance, especially in the lack of labeled speech. We also extend the proposed method to zero-shot multi-speaker TTS (ZS-TTS). The experimental results verify the effectiveness of the proposed method in terms of naturalness, intelligibility, and speaker generalization. We highlight that the single speaker TTS model fine-tuned on the only 10 minutes of labeled dataset outperforms the other baselines, and the ZS-TTS model fine-tuned on the only 30 minutes of single speaker dataset can generate the voice of the arbitrary speaker, by pre-training on unlabeled multi-speaker speech corpus.
△ Less
Submitted 6 October, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Deep learning-based reconstruction of highly accelerated 3D MRI
Authors:
Sangtae Ahn,
Uri Wollner,
Graeme McKinnon,
Isabelle Heukensfeldt Jansen,
Rafi Brada,
Dan Rettmann,
Ty A. Cashen,
John Huston,
J. Kevin DeMarco,
Robert Y. Shih,
Joshua D. Trzasko,
Christopher J. Hardy,
Thomas K. F. Foo
Abstract:
Purpose: To accelerate brain 3D MRI scans by using a deep learning method for reconstructing images from highly-undersampled multi-coil k-space data
Methods: DL-Speed, an unrolled optimization architecture with dense skip-layer connections, was trained on 3D T1-weighted brain scan data to reconstruct complex-valued images from highly-undersampled k-space data. The trained model was evaluated on…
▽ More
Purpose: To accelerate brain 3D MRI scans by using a deep learning method for reconstructing images from highly-undersampled multi-coil k-space data
Methods: DL-Speed, an unrolled optimization architecture with dense skip-layer connections, was trained on 3D T1-weighted brain scan data to reconstruct complex-valued images from highly-undersampled k-space data. The trained model was evaluated on 3D MPRAGE brain scan data retrospectively-undersampled with a 10-fold acceleration, compared to a conventional parallel imaging method with a 2-fold acceleration. Scores of SNR, artifacts, gray/white matter contrast, resolution/sharpness, deep gray-matter, cerebellar vermis, anterior commissure, and overall quality, on a 5-point Likert scale, were assessed by experienced radiologists. In addition, the trained model was tested on retrospectively-undersampled 3D T1-weighted LAVA (Liver Acquisition with Volume Acceleration) abdominal scan data, and prospectively-undersampled 3D MPRAGE and LAVA scans in three healthy volunteers and one, respectively.
Results: The qualitative scores for DL-Speed with a 10-fold acceleration were higher than or equal to those for the parallel imaging with 2-fold acceleration. DL-Speed outperformed a compressed sensing method in quantitative metrics on retrospectively-undersampled LAVA data. DL-Speed was demonstrated to perform reasonably well on prospectively-undersampled scan data, realizing a 2-5 times reduction in scan time.
Conclusion: DL-Speed was shown to accelerate 3D MPRAGE and LAVA with up to a net 10-fold acceleration, achieving 2-5 times faster scans compared to conventional parallel imaging and acceleration, while maintaining diagnostic image quality and real-time reconstruction. The brain scan data-trained DL-Speed also performed well when reconstructing abdominal LAVA scan data, demonstrating versatility of the network.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Learning Multiple Probabilistic Degradation Generators for Unsupervised Real World Image Super Resolution
Authors:
Sangyun Lee,
Sewoong Ahn,
Kwang** Yoon
Abstract:
Unsupervised real world super resolution (USR) aims to restore high-resolution (HR) images given low-resolution (LR) inputs, and its difficulty stems from the absence of paired dataset. One of the most common approaches is synthesizing noisy LR images using GANs (i.e., degradation generators) and utilizing a synthetic dataset to train the model in a supervised manner. Although the goal of training…
▽ More
Unsupervised real world super resolution (USR) aims to restore high-resolution (HR) images given low-resolution (LR) inputs, and its difficulty stems from the absence of paired dataset. One of the most common approaches is synthesizing noisy LR images using GANs (i.e., degradation generators) and utilizing a synthetic dataset to train the model in a supervised manner. Although the goal of training the degradation generator is to approximate the distribution of LR images given a HR image, previous works have heavily relied on the unrealistic assumption that the conditional distribution is a delta function and learned the deterministic map** from the HR image to a LR image. In this paper, we show that we can improve the performance of USR models by relaxing the assumption and propose to train the probabilistic degradation generator. Our probabilistic degradation generator can be viewed as a deep hierarchical latent variable model and is more suitable for modeling the complex conditional distribution. We also reveal the notable connection with the noise injection of StyleGAN. Furthermore, we train multiple degradation generators to improve the mode coverage and apply collaborative learning for ease of training. We outperform several baselines on benchmark datasets in terms of PSNR and SSIM and demonstrate the robustness of our method on unseen data distribution. Code is available at https://github.com/sangyun884/MSSR.
△ Less
Submitted 21 August, 2022; v1 submitted 25 January, 2022;
originally announced January 2022.
-
Implementation of Noise-Shaped Signaling System through Software-Defined Radio
Authors:
Junsung Choi,
Dongryul Park,
Suil Kim,
Seungyoung Ahn
Abstract:
As developments of electromagnetic weapons, Electronic Warfare (EW) has been rising as the future form of war. Especially in wireless communications, the high security defense systems, such as Low Probability of Detection (LPD), Low Probability of Interception (LPI), or Low Prob-ability of Exploitation (LPE) communication algorithms, are studied to prevent the military force loss. One of the LPD,…
▽ More
As developments of electromagnetic weapons, Electronic Warfare (EW) has been rising as the future form of war. Especially in wireless communications, the high security defense systems, such as Low Probability of Detection (LPD), Low Probability of Interception (LPI), or Low Prob-ability of Exploitation (LPE) communication algorithms, are studied to prevent the military force loss. One of the LPD, LPI, and LPE communication algorithm, physical-layer security, has been discussed and studied. We propose a noise signaling system, a type of physical-layer secu-rity, which modifies conventionally modulated I/Q data into a noise-like shape. For presenting the possibility of realistic implementation, we use Software-Defined Radio (SDR). Since there are certain limitations of hardware, we present the limitations, requirements, and preferences of practical implementation of noise signaling system, and the proposed system is ring-shaped signaling. We present the ring-shaped signaling system algorithm, SDR implementation meth-odology, and performance evaluations of the system by the metrics of Bit Error Rate (BER) and Probability of Modulation Identification (PMI), which we obtain by Convolutional Neural Net-work (CNN) algorithm. We conclude that the ring-shaped signaling system can perform a high LPI/LPE communication function due to the eavesdropper cannot obtain the correct used modu-lation scheme information, and the performance can vary by the configurations of the I/Q data modifying factors.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models
Authors:
Ji Won Yoon,
Hyung Yong Kim,
Hyeonseung Lee,
Sunghwan Ahn,
Nam Soo Kim
Abstract:
Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teach…
▽ More
Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teacher model for connectionist temporal classification (CTC)-based sequence models, namely Oracle Teacher, that leverages both the source inputs and the output labels as the teacher model's input. Since the Oracle Teacher learns a more accurate CTC alignment by referring to the target information, it can provide the student with more optimal guidance. One potential risk for the proposed approach is a trivial solution that the model's output directly copies the target input. Based on a many-to-one map** property of the CTC algorithm, we present a training strategy that can effectively prevent the trivial solution and thus enables utilizing both source and target inputs for model training. Extensive experiments are conducted on two sequence learning tasks: speech recognition and scene text recognition. From the experimental results, we empirically show that the proposed model improves the students across these tasks while achieving a considerable speed-up in the teacher model's training time.
△ Less
Submitted 11 August, 2023; v1 submitted 5 November, 2021;
originally announced November 2021.
-
Predicting Vehicles' Longitudinal Trajectories and Lane Changes on Highway On-Ramps
Authors:
Nachuan Li,
Riley Fischer,
Wissam Kontar,
Soyoung Ahn
Abstract:
Vehicles on highway on-ramps are one of the leading contributors to congestion. In this paper, we propose a prediction framework that predicts the longitudinal trajectories and lane changes (LCs) of vehicles on highway on-ramps and tapers. Specifically, our framework adopts a combination of prediction models that inputs a 4 seconds duration of a trajectory to output a forecast of the longitudinal…
▽ More
Vehicles on highway on-ramps are one of the leading contributors to congestion. In this paper, we propose a prediction framework that predicts the longitudinal trajectories and lane changes (LCs) of vehicles on highway on-ramps and tapers. Specifically, our framework adopts a combination of prediction models that inputs a 4 seconds duration of a trajectory to output a forecast of the longitudinal trajectories and LCs up to 15 seconds ahead. Training and Validation based on next generation simulation (NGSIM) data show that the prediction power of the developed model and its accuracy outperforms a traditional long-short term memory (LSTM) model. Ultimately, the work presented here can alleviate the congestion experienced on on-ramps, improve safety, and guide effective traffic control strategies.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
NTIRE 2021 Challenge on Perceptual Image Quality Assessment
Authors:
**** Gu,
Haoming Cai,
Chao Dong,
Jimmy S. Ren,
Yu Qiao,
Shuhang Gu,
Radu Timofte,
Manri Cheon,
Sungjun Yoon,
Byungyeon Kang,
Junwoo Lee,
Qing Zhang,
Haiyang Guo,
Yi Bin,
Yuqing Hou,
Hengliang Luo,
**gyu Guo,
Zirui Wang,
Hai Wang,
Wenming Yang,
Qingyan Bai,
Shuwei Shi,
Weihao Xia,
Mingdeng Cao,
Jiahao Wang
, et al. (25 additional authors not shown)
Abstract:
This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o…
▽ More
This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These output images have completely different characteristics from traditional distortions, thus pose a new challenge for IQA methods to evaluate their visual quality. In comparison with previous IQA challenges, the training and testing datasets in this challenge include the outputs of perceptual image processing algorithms and the corresponding subjective scores. Thus they can be used to develop and evaluate IQA methods on GAN-based distortions. The challenge has 270 registered participants in total. In the final testing stage, 13 participating teams submitted their models and fact sheets. Almost all of them have achieved much better results than existing IQA methods, while the winning method can demonstrate state-of-the-art performance.
△ Less
Submitted 28 June, 2021; v1 submitted 7 May, 2021;
originally announced May 2021.
-
Adaptive Gradient Balancing for Undersampled MRI Reconstruction and Image-to-Image Translation
Authors:
Itzik Malkiel,
Sangtae Ahn,
Valentina Taviani,
Anne Menini,
Lior Wolf,
Christopher J. Hardy
Abstract:
Recent accelerated MRI reconstruction models have used Deep Neural Networks (DNNs) to reconstruct relatively high-quality images from highly undersampled k-space data, enabling much faster MRI scanning. However, these techniques sometimes struggle to reconstruct sharp images that preserve fine detail while maintaining a natural appearance. In this work, we enhance the image quality by using a Cond…
▽ More
Recent accelerated MRI reconstruction models have used Deep Neural Networks (DNNs) to reconstruct relatively high-quality images from highly undersampled k-space data, enabling much faster MRI scanning. However, these techniques sometimes struggle to reconstruct sharp images that preserve fine detail while maintaining a natural appearance. In this work, we enhance the image quality by using a Conditional Wasserstein Generative Adversarial Network combined with a novel Adaptive Gradient Balancing (AGB) technique that automates the process of combining the adversarial and pixel-wise terms and streamlines hyperparameter tuning. In addition, we introduce a Densely Connected Iterative Network, which is an undersampled MRI reconstruction network that utilizes dense connections. In MRI, our method minimizes artifacts, while maintaining a high-quality reconstruction that produces sharper images than other techniques. To demonstrate the general nature of our method, it is further evaluated on a battery of image-to-image translation experiments, demonstrating an ability to recover from sub-optimal weighting in multi-term adversarial training.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
DSRC-Enabled Train Safety Communication System at Unmanned Crossings
Authors:
Junsung Choi,
Vuk Marojevic,
Carl B. Dietrich,
Seungyoung Ahn
Abstract:
Although wireless technology is available for safety-critical applications, few applications have been used to improve train crossing safety. To prevent potential collisions between trains and vehicles, we present a Dedicated Short-Range Communication (DSRC)-enabled train safety communication system targeting to implement at unmanned crossings. Since our application's purpose is preventing collisi…
▽ More
Although wireless technology is available for safety-critical applications, few applications have been used to improve train crossing safety. To prevent potential collisions between trains and vehicles, we present a Dedicated Short-Range Communication (DSRC)-enabled train safety communication system targeting to implement at unmanned crossings. Since our application's purpose is preventing collisions between trains and vehicles, we present a method to calculate the minimum required warning time for head-to-head collision at the train crossing. Furthermore, we define the best- and worst-case scenarios and provide practical measurements at six operating crossings in the U.S. with numerous system configurations such as modulation scheme, transmission power, antenna type, train speed, and vehicle braking distances. From our measurements, we find that the warning application coverage range is independent of the train speed, that the omnidirectional antenna with high transmission power is the best configuration for our system, and that the latency values are mostly less than 5 ms. We use the radio communication coverage to evaluate the time to avoid collision and introduce the safeness level metric. From the measured data, we observe that the DSRC-enabled train safety communication system is feasible for up to 35 mph train speeds which is providing more than 25-30 s time to avoid the collision for 25-65 mph vehicle speeds. Higher train speeds are expected to be safe, but more measurements beyond the 200 m mark with respect to a crossing considered here are needed for a definite conclusion.
△ Less
Submitted 4 April, 2021;
originally announced April 2021.
-
Real-time Monitoring of Autonomous Vehicle's Time Gap Variations: A Bayesian Framework
Authors:
Wissam Kontar,
Soyoung Ahn
Abstract:
This paper proposes a novel monitoring methodology for car-following control of automated vehicles that uses real-time measurements of spacing and velocity obtained through vehicle sensors. This study focuses on monitoring the time gap, a key parameter that dictates the desired following spacing of the controlled vehicle. The goal is to monitor deviations in actual time gap from a desired setting…
▽ More
This paper proposes a novel monitoring methodology for car-following control of automated vehicles that uses real-time measurements of spacing and velocity obtained through vehicle sensors. This study focuses on monitoring the time gap, a key parameter that dictates the desired following spacing of the controlled vehicle. The goal is to monitor deviations in actual time gap from a desired setting and detect when it deviates beyond a control limit. A random coefficient modeling is developed to systematically capture the stochastic distribution of the time gap and derive a closed-form Bayesian updating scheme for real-time inference. A control chart is then adopted to systematically set the control limits and inform when the time gap setting should be changed. Simulation experiments are performed to demonstrate the effectiveness of the proposes method for monitoring the time gap and alerting when the parameter setting needs to be changed.
△ Less
Submitted 30 January, 2021;
originally announced February 2021.
-
Survey of Spectrum Regulation for Intelligent Transportation Systems
Authors:
Junsung Choi,
Vuk Marojevic,
Carl B. Dietrich,
Jeffrey H. Reed,
Seungyoung Ahn
Abstract:
As 5G communication technology develops, vehicular communications that require high reliability, low latency, and massive connectivity are drawing increasing interest from those in academia and industry. Due to these develo** technologies, vehicular communication is not limited to vehicle components in the forms of Vehicle-to-Vehicle (V2V) or Vehicle-to-Infrastructure (V2I) networks, but has als…
▽ More
As 5G communication technology develops, vehicular communications that require high reliability, low latency, and massive connectivity are drawing increasing interest from those in academia and industry. Due to these develo** technologies, vehicular communication is not limited to vehicle components in the forms of Vehicle-to-Vehicle (V2V) or Vehicle-to-Infrastructure (V2I) networks, but has also been extended to connect with others, such as pedestrians and cellular users. Dedicated Short-Range Communications (DSRC) is the conventional vehicular communication standard for Intelligent Transportation Systems (ITS). More recently, the 3rd Generation Partnership Project introduced Cellular-Vehicle-to-Everything (C-V2X), a competitor to DSRC. Meanwhile, the Federal Communications Commission (FCC)issued a Notice of Proposed Rulemaking (NPRM) to consider deploying Unlicensed National Information Infrastructure (U-NII)devices in the ITS band with two interference mitigation approaches: Detect-and-Vacate (DAV)and Re-channelization (Re-CH). With multiple standard options and interference mitigation approaches, numerous regulatory taxonomies can be identified and notification of relevant technical challenges issued. However, these challenges are much broader than the current and future regulatory taxonomies pursued by the different countries involved. Because their plans differ, the technical and regulatory challenges vary. This paper presents a literature survey about the technical challenges, the current and future ITS band usage plans, and the major research testbeds for the U.S., Europe, China, Korea, and Japan. This survey shows that the most likely deployment taxonomies are (1) DSRC, C-V2X, and Wi-Fi with Re-CH; (2) DSRC and C-V2X with interoperation, and (3) C-V2X only. The most difficult technical challenge is the interoperability between the Wi-Fi-like DSRC and 4G LTE-like C-V2X.
△ Less
Submitted 3 August, 2020; v1 submitted 26 June, 2020;
originally announced August 2020.
-
A Novel Approach for Correcting Multiple Discrete Rigid In-Plane Motions Artefacts in MRI Scans
Authors:
Michael Rotman,
Rafi Brada,
Israel Beniaminy,
Sangtae Ahn,
Christopher J. Hardy,
Lior Wolf
Abstract:
Motion artefacts created by patient motion during an MRI scan occur frequently in practice, often rendering the scans clinically unusable and requiring a re-scan. While many methods have been employed to ameliorate the effects of patient motion, these often fall short in practice. In this paper we propose a novel method for removing motion artefacts using a deep neural network with two input branc…
▽ More
Motion artefacts created by patient motion during an MRI scan occur frequently in practice, often rendering the scans clinically unusable and requiring a re-scan. While many methods have been employed to ameliorate the effects of patient motion, these often fall short in practice. In this paper we propose a novel method for removing motion artefacts using a deep neural network with two input branches that discriminates between patient poses using the motion's timing. The first branch receives a subset of the $k$-space data collected during a single patient pose, and the second branch receives the remaining part of the collected $k$-space data. The proposed method can be applied to artefacts generated by multiple movements of the patient. Furthermore, it can be used to correct motion for the case where $k$-space has been under-sampled, to shorten the scan time, as is common when using methods such as parallel imaging or compressed sensing. Experimental results on both simulated and real MRI data show the efficacy of our approach.
△ Less
Submitted 29 June, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Attenuation Coefficient Estimation for PET/MRI With Bayesian Deep Learning pseudo-CT and Maximum Likelihood Estimation of Activity and Attenuation
Authors:
Andrew P. Leynes,
Sangtae P. Ahn,
Kristen A. Wangerin,
Sandeep S. Kaushik,
Florian Wiesinger,
Thomas A. Hope,
Peder E. Z. Larson
Abstract:
A major remaining challenge for magnetic resonance-based attenuation correction methods (MRAC) is their susceptibility to sources of MRI artifacts (e.g. implants, motion) and uncertainties due to the limitations of MRI contrast (e.g. accurate bone delineation and density, and separation of air/bone). We propose using a Bayesian deep convolutional neural network that, in addition to generating an i…
▽ More
A major remaining challenge for magnetic resonance-based attenuation correction methods (MRAC) is their susceptibility to sources of MRI artifacts (e.g. implants, motion) and uncertainties due to the limitations of MRI contrast (e.g. accurate bone delineation and density, and separation of air/bone). We propose using a Bayesian deep convolutional neural network that, in addition to generating an initial pseudo-CT from MR data, also produces uncertainty estimates of the pseudo-CT to quantify the limitations of the MR data. These outputs are combined with MLAA reconstruction that uses the PET emission data to improve the attenuation maps. With the proposed approach (UpCT-MLAA), we demonstrate accurate estimation of PET uptake in pelvic lesions and show recovery of metal implants. In patients without implants, UpCT-MLAA had acceptable but slightly higher RMSE than Zero-echo-time and Dixon Deep pseudo-CT when compared to CTAC. In patients with metal implants, MLAA recovered the metal implant; however, anatomy outside the implant region was obscured by noise and crosstalk artifacts. Attenuation coefficients from the pseudo-CT from Dixon MRI were accurate in normal anatomy; however, the metal implant region was estimated to have attenuation coefficients of air. UpCT-MLAA estimated attenuation coefficients of metal implants alongside accurate anatomic depiction outside of implant regions.
△ Less
Submitted 13 October, 2021; v1 submitted 10 January, 2020;
originally announced January 2020.
-
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition
Authors:
Zhixuan Lin,
Yi-Fu Wu,
Skand Vishwanath Peri,
Weihao Sun,
Gautam Singh,
Fei Deng,
**dong Jiang,
Sung** Ahn
Abstract:
The ability to decompose complex multi-object scenes into meaningful abstractions like objects is fundamental to achieve higher-level cognition. Previous approaches for unsupervised object-oriented scene representation learning are either based on spatial-attention or scene-mixture approaches and limited in scalability which is a main obstacle towards modeling real-world scenes. In this paper, we…
▽ More
The ability to decompose complex multi-object scenes into meaningful abstractions like objects is fundamental to achieve higher-level cognition. Previous approaches for unsupervised object-oriented scene representation learning are either based on spatial-attention or scene-mixture approaches and limited in scalability which is a main obstacle towards modeling real-world scenes. In this paper, we propose a generative latent variable model, called SPACE, that provides a unified probabilistic modeling framework that combines the best of spatial-attention and scene-mixture approaches. SPACE can explicitly provide factorized object representations for foreground objects while also decomposing background segments of complex morphology. Previous models are good at either of these, but not both. SPACE also resolves the scalability problems of previous methods by incorporating parallel spatial-attention and thus is applicable to scenes with a large number of objects without performance degradations. We show through experiments on Atari and 3D-Rooms that SPACE achieves the above properties consistently in comparison to SPAIR, IODINE, and GENESIS. Results of our experiments can be found on our project website: https://sites.google.com/view/space-project-page
△ Less
Submitted 15 March, 2020; v1 submitted 8 January, 2020;
originally announced January 2020.
-
Conditional WGANs with Adaptive Gradient Balancing for Sparse MRI Reconstruction
Authors:
Itzik Malkiel,
Sangtae Ahn,
Valentina Taviani,
Anne Menini,
Lior Wolf,
Christopher J. Hardy
Abstract:
Recent sparse MRI reconstruction models have used Deep Neural Networks (DNNs) to reconstruct relatively high-quality images from highly undersampled k-space data, enabling much faster MRI scanning. However, these techniques sometimes struggle to reconstruct sharp images that preserve fine detail while maintaining a natural appearance. In this work, we enhance the image quality by using a Condition…
▽ More
Recent sparse MRI reconstruction models have used Deep Neural Networks (DNNs) to reconstruct relatively high-quality images from highly undersampled k-space data, enabling much faster MRI scanning. However, these techniques sometimes struggle to reconstruct sharp images that preserve fine detail while maintaining a natural appearance. In this work, we enhance the image quality by using a Conditional Wasserstein Generative Adversarial Network combined with a novel Adaptive Gradient Balancing technique that stabilizes the training and minimizes the degree of artifacts, while maintaining a high-quality reconstruction that produces sharper images than other techniques.
△ Less
Submitted 2 May, 2019;
originally announced May 2019.