Search | arXiv e-print repository

Artificial Intelligence for Neuro MRI Acquisition: A Review

Authors: Hongjia Yang, Guanhua Wang, Ziyu Li, Haoxiang Li, Jialan Zheng, Yuxin Hu, Xiaozhi Cao, Congyu Liao, Huihui Ye, Qiyuan Tian

Abstract: Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti… ▽ More Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potential in enhancing the efficiency and throughput of acquisition steps. This review discusses several pivotal AI-based methods in neuro MRI acquisition, focusing on their technological advances, impact on clinical practice, and potential risks. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Submitted to MAGMA for review

arXiv:2404.19167 [pdf]

Advancing low-field MRI with a universal denoising imaging transformer: Towards fast and high-quality imaging

Authors: Zheren Zhu, Azaan Rehman, Xiaozhi Cao, Congyu Liao, Yoo ** Lee, Michael Ohliger, Hui Xue, Yang Yang

Abstract: Recent developments in low-field (LF) magnetic resonance imaging (MRI) systems present remarkable opportunities for affordable and widespread MRI access. A robust denoising method to overcome the intrinsic low signal-noise-ratio (SNR) barrier is critical to the success of LF MRI. However, current data-driven MRI denoising methods predominantly handle magnitude images and rely on customized models… ▽ More Recent developments in low-field (LF) magnetic resonance imaging (MRI) systems present remarkable opportunities for affordable and widespread MRI access. A robust denoising method to overcome the intrinsic low signal-noise-ratio (SNR) barrier is critical to the success of LF MRI. However, current data-driven MRI denoising methods predominantly handle magnitude images and rely on customized models with constrained data diversity and quantity, which exhibit limited generalizability in clinical applications across diverse MRI systems, pulse sequences, and organs. In this study, we present ImT-MRD: a complex-valued imaging transformer trained on a vast number of clinical MRI scans aiming at universal MR denoising at LF systems. Compared with averaging multiple-repeated scans for higher image SNR, the model obtains better image quality from fewer repetitions, demonstrating its capability for accelerating scans under various clinical settings. Moreover, with its complex-valued image input, the model can denoise intermediate results before advanced post-processing and prepare high-quality data for further MRI research. By delivering universal and accurate denoising across clinical and research tasks, our model holds great promise to expedite the evolution of LF MRI for accessible and equal biomedical applications. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2401.12890 [pdf, other]

An Efficient Algorithm for Spatial-Spectral Partial Volume Compartment Map** with Applications to Multicomponent Diffusion and Relaxation MRI

Authors: Yunsong Liu, Debdut Mandal, Congyu Liao, Kawin Setsompop, Justin P. Haldar

Abstract: Previous work has shown that high-quality partial volume tissue compartment maps can be obtained by applying spatially-regularized spectroscopic image estimation techniques to high-dimensional multicontrast MRI data (e.g., diffusion-relaxation data). However, this generally requires substantial computational complexity. In this work, we propose a more efficient algorithm to estimate spectroscopic… ▽ More Previous work has shown that high-quality partial volume tissue compartment maps can be obtained by applying spatially-regularized spectroscopic image estimation techniques to high-dimensional multicontrast MRI data (e.g., diffusion-relaxation data). However, this generally requires substantial computational complexity. In this work, we propose a more efficient algorithm to estimate spectroscopic images using spatial regularization. Our algorithm is based on the linearized alternating directions method of multipliers (LADMM), and relies on the introduction of novel quadratic penalty terms to substantially simplify the subproblems that appear within each iteration. We evaluate this algorithm in a variety of different scenarios (diffusion-relaxation, relaxation-relaxation, relaxometry, and magnetic resonance fingerprinting), where we consistently observe substantial ($\sim$3$\times$-50$\times$) speed improvements. We expect that this new algorithm will reduce barriers to using spatial regularization and multiparametric contrast-encoded MRI data acquisition methods for partial volume compartment map**. △ Less

Submitted 5 June, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2312.13523 [pdf]

doi 10.1002/mrm.29990

High-resolution myelin-water fraction and quantitative relaxation map** using 3D ViSTa-MR fingerprinting

Authors: Congyu Liao, Xiaozhi Cao, Siddharth Srinivasan Iyer, Sophie Schauman, Zihan Zhou, Xiaoqian Yan, Quan Chen, Zhitao Li, Nan Wang, Ting Gong, Zhe Wu, Hongjian He, Jianhui Zhong, Yang Yang, Adam Kerr, Kalanit Grill-Spector, Kawin Setsompop

Abstract: Purpose: This study aims to develop a high-resolution whole-brain multi-parametric quantitative MRI approach for simultaneous map** of myelin-water fraction (MWF), T1, T2, and proton-density (PD), all within a clinically feasible scan time. Methods: We developed 3D ViSTa-MRF, which combined Visualization of Short Transverse relaxation time component (ViSTa) technique with MR Fingerprinting (MR… ▽ More Purpose: This study aims to develop a high-resolution whole-brain multi-parametric quantitative MRI approach for simultaneous map** of myelin-water fraction (MWF), T1, T2, and proton-density (PD), all within a clinically feasible scan time. Methods: We developed 3D ViSTa-MRF, which combined Visualization of Short Transverse relaxation time component (ViSTa) technique with MR Fingerprinting (MRF), to achieve high-fidelity whole-brain MWF and T1/T2/PD map** on a clinical 3T scanner. To achieve fast acquisition and memory-efficient reconstruction, the ViSTa-MRF sequence leverages an optimized 3D tiny-golden-angle-shuffling spiral-projection acquisition and joint spatial-temporal subspace reconstruction with optimized preconditioning algorithm. With the proposed ViSTa-MRF approach, high-fidelity direct MWF map** was achieved without a need for multi-compartment fitting that could introduce bias and/or noise from additional assumptions or priors. Results: The in-vivo results demonstrate the effectiveness of the proposed acquisition and reconstruction framework to provide fast multi-parametric map** with high SNR and good quality. The in-vivo results of 1mm- and 0.66mm-iso datasets indicate that the MWF values measured by the proposed method are consistent with standard ViSTa results that are 30x slower with lower SNR. Furthermore, we applied the proposed method to enable 5-minute whole-brain 1mm-iso assessment of MWF and T1/T2/PD map**s for infant brain development and for post-mortem brain samples. Conclusions: In this work, we have developed a 3D ViSTa-MRF technique that enables the acquisition of whole-brain MWF, quantitative T1, T2, and PD maps at 1mm and 0.66mm isotropic resolution in 5 and 15 minutes, respectively. This advancement allows for quantitative investigations of myelination changes in the brain. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 38 pages, 12 figures and 1 table

Journal ref: Magnetic Resonance in Medicine 2023

arXiv:2310.10823 [pdf, other]

Implicit Representation of GRAPPA Kernels for Fast MRI Reconstruction

Authors: Daniel Abraham, Mark Nishimura, Xiaozhi Cao, Congyu Liao, Kawin Setsompop

Abstract: MRI data is acquired in Fourier space/k-space. Data acquisition is typically performed on a Cartesian grid in this space to enable the use of a fast Fourier transform algorithm to achieve fast and efficient reconstruction. However, it has been shown that for multiple applications, non-Cartesian data acquisition can improve the performance of MR imaging by providing fast and more efficient data acq… ▽ More MRI data is acquired in Fourier space/k-space. Data acquisition is typically performed on a Cartesian grid in this space to enable the use of a fast Fourier transform algorithm to achieve fast and efficient reconstruction. However, it has been shown that for multiple applications, non-Cartesian data acquisition can improve the performance of MR imaging by providing fast and more efficient data acquisition, and improving motion robustness. Nonetheless, the image reconstruction process of non-Cartesian data is more involved and can be time-consuming, even through the use of efficient algorithms such as non-uniform FFT (NUFFT). Reconstruction complexity is further exacerbated when imaging in the presence of field imperfections. This work (implicit GROG) provides an efficient approach to transform the field corrupted non-Cartesian data into clean Cartesian data, to achieve simpler and faster reconstruction which should help enable non-Cartesian data sampling to be performed more widely in MRI. △ Less

Submitted 14 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2306.16036 [pdf, other]

A Cascaded Approach for ultraly High Performance Lesion Detection and False Positive Removal in Liver CT Scans

Authors: Fakai Wang, Chi-Tung Cheng, Chien-Wei Peng, Ke Yan, Min Wu, Le Lu, Chien-Hung Liao, Ling Zhang

Abstract: Liver cancer has high morbidity and mortality rates in the world. Multi-phase CT is a main medical imaging modality for detecting/identifying and diagnosing liver tumors. Automatically detecting and classifying liver lesions in CT images have the potential to improve the clinical workflow. This task remains challenging due to liver lesions' large variations in size, appearance, image contrast, and… ▽ More Liver cancer has high morbidity and mortality rates in the world. Multi-phase CT is a main medical imaging modality for detecting/identifying and diagnosing liver tumors. Automatically detecting and classifying liver lesions in CT images have the potential to improve the clinical workflow. This task remains challenging due to liver lesions' large variations in size, appearance, image contrast, and the complexities of tumor types or subtypes. In this work, we customize a multi-object labeling tool for multi-phase CT images, which is used to curate a large-scale dataset containing 1,631 patients with four-phase CT images, multi-organ masks, and multi-lesion (six major types of liver lesions confirmed by pathology) masks. We develop a two-stage liver lesion detection pipeline, where the high-sensitivity detecting algorithms in the first stage discover as many lesion proposals as possible, and the lesion-reclassification algorithms in the second stage remove as many false alarms as possible. The multi-sensitivity lesion detection algorithm maximizes the information utilization of the individual probability maps of segmentation, and the lesion-shuffle augmentation effectively explores the texture contrast between lesions and the liver. Independently tested on 331 patient cases, the proposed model achieves high sensitivity and specificity for malignancy classification in the multi-phase contrast-enhanced CT (99.2%, 97.1%, diagnosis setting) and in the noncontrast CT (97.3%, 95.7%, screening setting). △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2304.00474 [pdf, ps, other]

On the Optimal Recovery of Graph Signals

Authors: Simon Foucart, Chunyang Liao, Nate Veldt

Abstract: Learning a smooth graph signal from partially observed data is a well-studied task in graph-based machine learning. We consider this task from the perspective of optimal recovery, a mathematical framework for learning a function from observational data that adopts a worst-case perspective tied to model assumptions on the function to be learned. Earlier work in the optimal recovery literature has s… ▽ More Learning a smooth graph signal from partially observed data is a well-studied task in graph-based machine learning. We consider this task from the perspective of optimal recovery, a mathematical framework for learning a function from observational data that adopts a worst-case perspective tied to model assumptions on the function to be learned. Earlier work in the optimal recovery literature has shown that minimizing a regularized objective produces optimal solutions for a general class of problems, but did not fully identify the regularization parameter. Our main contribution provides a way to compute regularization parameters that are optimal or near-optimal (depending on the setting), specifically for graph signal processing problems. Our results offer a new interpretation for classical optimization techniques in graph-based learning and also come with new insights for hyperparameter selection. We illustrate the potential of our methods in numerical experiments on several semi-synthetic graph signal processing datasets. △ Less

Submitted 29 May, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

Comments: This paper has been accepted by 14th International conference on Sampling Theory and Applications (SampTA 2023)

arXiv:2301.02732 [pdf]

doi 10.1109/BigData55660.2022.10021009

Multimodal Lyrics-Rhythm Matching

Authors: Callie C. Liao, Duoduo Liao, Jesse Guessford

Abstract: Despite the recent increase in research on artificial intelligence for music, prominent correlations between key components of lyrics and rhythm such as keywords, stressed syllables, and strong beats are not frequently studied. This is likely due to challenges such as audio misalignment, inaccuracies in syllabic identification, and most importantly, the need for cross-disciplinary knowledge. To ad… ▽ More Despite the recent increase in research on artificial intelligence for music, prominent correlations between key components of lyrics and rhythm such as keywords, stressed syllables, and strong beats are not frequently studied. This is likely due to challenges such as audio misalignment, inaccuracies in syllabic identification, and most importantly, the need for cross-disciplinary knowledge. To address this lack of research, we propose a novel multimodal lyrics-rhythm matching approach in this paper that specifically matches key components of lyrics and music with each other without any language limitations. We use audio instead of sheet music with readily available metadata, which creates more challenges yet increases the application flexibility of our method. Furthermore, our approach creatively generates several patterns involving various multimodalities, including music strong beats, lyrical syllables, auditory changes in a singer's pronunciation, and especially lyrical keywords, which are utilized for matching key lyrical elements with key rhythmic elements. This advantageous approach not only provides a unique way to study auditory lyrics-rhythm correlations including efficient rhythm-based audio alignment algorithms, but also bridges computational linguistics with music as well as music cognition. Our experimental results reveal an 0.81 probability of matching on average, and around 30% of the songs have a probability of 0.9 or higher of keywords landing on strong beats, including 12% of the songs with a perfect landing. Also, the similarity metrics are used to evaluate the correlation between lyrics and rhythm. It shows that nearly 50% of the songs have 0.70 similarity or higher. In conclusion, our approach contributes significantly to the lyrics-rhythm relationship by computationally unveiling insightful correlations. △ Less

Submitted 14 March, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: Accepted by 2022 IEEE International Conference on Big Data (IEEE Big Data 2022)

arXiv:2212.00687 [pdf]

3D-EPI Blip-Up/Down Acquisition (BUDA) with CAIPI and Joint Hankel Structured Low-Rank Reconstruction for Rapid Distortion-Free High-Resolution T2* Map**

Authors: Zhifeng Chen, Congyu Liao, Xiaozhi Cao, Benedikt A. Poser, Zhongbiao Xu, Wei-Ching Lo, Manyi Wen, Jae** Cho, Qiyuan Tian, Yaohui Wang, Yanqiu Feng, Ling Xia, Wufan Chen, Feng Liu, Berkin Bilgic

Abstract: Purpose: This work aims to develop a novel distortion-free 3D-EPI acquisition and image reconstruction technique for fast and robust, high-resolution, whole-brain imaging as well as quantitative T2* map**. Methods: 3D-Blip-Up and -Down Acquisition (3D-BUDA) sequence is designed for both single- and multi-echo 3D GRE-EPI imaging using multiple shots with blip-up and -down readouts to encode B0 fi… ▽ More Purpose: This work aims to develop a novel distortion-free 3D-EPI acquisition and image reconstruction technique for fast and robust, high-resolution, whole-brain imaging as well as quantitative T2* map**. Methods: 3D-Blip-Up and -Down Acquisition (3D-BUDA) sequence is designed for both single- and multi-echo 3D GRE-EPI imaging using multiple shots with blip-up and -down readouts to encode B0 field map information. Complementary k-space coverage is achieved using controlled aliasing in parallel imaging (CAIPI) sampling across the shots. For image reconstruction, an iterative hard-thresholding algorithm is employed to minimize the cost function that combines field map information informed parallel imaging with the structured low-rank constraint for multi-shot 3D-BUDA data. Extending 3D-BUDA to multi-echo imaging permits T2* map**. For this, we propose constructing a joint Hankel matrix along both echo and shot dimensions to improve the reconstruction. Results: Experimental results on in vivo multi-echo data demonstrate that, by performing joint reconstruction along with both echo and shot dimensions, reconstruction accuracy is improved compared to standard 3D-BUDA reconstruction. CAIPI sampling is further shown to enhance the image quality. For T2* map**, T2* values from 3D-Joint-CAIPI-BUDA and reference multi-echo GRE are within limits of agreement as quantified by Bland-Altman analysis. Conclusions: The proposed technique enables rapid 3D distortion-free high-resolution imaging and T2* map**. Specifically, 3D-BUDA enables 1-mm isotropic whole-brain imaging in 22 s at 3 T and 9 s on a 7 T scanner. The combination of multi-echo 3D-BUDA with CAIPI acquisition and joint reconstruction enables distortion-free whole-brain T2* map** in 47 s at 1.1x1.1x1.0 mm3 resolution. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2208.03970 [pdf, ps, other]

Optimized Design for IRS-Assisted Integrated Sensing and Communication Systems in Clutter Environments

Authors: Chikun Liao, Feng Wang, Vincent K. N. Lau

Abstract: In this paper, we investigate an intelligent reflecting surface (IRS)-assisted integrated sensing and communication (ISAC) system design in a clutter environment. Assisted by an IRS equipped with a uniform linear array (ULA), a multi-antenna base station (BS) is targeted for communicating with multiple communication users (CUs) and sensing multiple targets simultaneously. We consider the IRS-assis… ▽ More In this paper, we investigate an intelligent reflecting surface (IRS)-assisted integrated sensing and communication (ISAC) system design in a clutter environment. Assisted by an IRS equipped with a uniform linear array (ULA), a multi-antenna base station (BS) is targeted for communicating with multiple communication users (CUs) and sensing multiple targets simultaneously. We consider the IRS-assisted ISAC design in the case with Type-I or Type-II CUs, where each Type-I and Type-II CU can and cannot cancel the interference from sensing signals, respectively. In particular, we aim to maximize the minimum sensing beampattern gain among multiple targets, by jointly optimizing the BS transmit beamforming vectors and the IRS phase shifting matrix, subject to the signal-to-interference-plus-noise ratio (SINR) constraint for each Type-I/Type-II CU, the interference power constraint per clutter, the transmission power constraint at the BS, and the cross-correlation pattern constraint. Due to the coupling of the BS's transmit design variables and the IRS's phase shifting matrix, the formulated max-min IRS-assisted ISAC design problem in the case with Type-I/Type-II CUs is highly non-convex. As such, we propose an efficient algorithm based on the alternating-optimization and semi-definite relaxation (SDR) techniques. In the case with Type-I CUs, we show that the dedicated sensing signal at the BS is always beneficial to improve the sensing performance. By contrast, the dedicated sensing signal at the BS is not required in the case with Type-II CUs. Numerical results are provided to show that the proposed IRS-assisted ISAC design schemes achieve a significant gain over the existing benchmark schemes. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: 28 pages, 9 figures, single-column full paper

arXiv:2110.04005 [pdf, other]

KaraSinger: Score-Free Singing Voice Synthesis with VQ-VAE using Mel-spectrograms

Authors: Chien-Feng Liao, Jen-Yu Liu, Yi-Hsuan Yang

Abstract: In this paper, we propose a novel neural network model called KaraSinger for a less-studied singing voice synthesis (SVS) task named score-free SVS, in which the prosody and melody are spontaneously decided by machine. KaraSinger comprises a vector-quantized variational autoencoder (VQ-VAE) that compresses the Mel-spectrograms of singing audio to sequences of discrete codes, and a language model (… ▽ More In this paper, we propose a novel neural network model called KaraSinger for a less-studied singing voice synthesis (SVS) task named score-free SVS, in which the prosody and melody are spontaneously decided by machine. KaraSinger comprises a vector-quantized variational autoencoder (VQ-VAE) that compresses the Mel-spectrograms of singing audio to sequences of discrete codes, and a language model (LM) that learns to predict the discrete codes given the corresponding lyrics. For the VQ-VAE part, we employ a Connectionist Temporal Classification (CTC) loss to encourage the discrete codes to carry phoneme-related information. For the LM part, we use location-sensitive attention for learning a robust alignment between the input phoneme sequence and the output discrete code. We keep the architecture of both the VQ-VAE and LM light-weight for fast training and inference speed. We validate the effectiveness of the proposed design choices using a proprietary collection of 550 English pop songs sung by multiple amateur singers. The result of a listening test shows that KaraSinger achieves high scores in intelligibility, musicality, and the overall quality. △ Less

Submitted 8 October, 2021; originally announced October 2021.

Comments: Submitted to ICASSP 2022

arXiv:2108.12587 [pdf]

BUDA-SAGE with self-supervised denoising enables fast, distortion-free, high-resolution T2, T2*, para- and dia-magnetic susceptibility map**

Authors: Zi**g Zhang, Long Wang, Jae** Cho, Congyu Liao, Hyeong-Geol Shin, Xiaozhi Cao, Jongho Lee, **min Xu, Tao Zhang, Huihui Ye, Kawin Setsompop, Huafeng Liu, Berkin Bilgic

Abstract: To rapidly obtain high resolution T2, T2* and quantitative susceptibility map** (QSM) source separation maps with whole-brain coverage and high geometric fidelity. We propose Blip Up-Down Acquisition for Spin And Gradient Echo imaging (BUDA-SAGE), an efficient echo-planar imaging (EPI) sequence for quantitative map**. The acquisition includes multiple T2*-, T2'- and T2-weighted contrasts. We a… ▽ More To rapidly obtain high resolution T2, T2* and quantitative susceptibility map** (QSM) source separation maps with whole-brain coverage and high geometric fidelity. We propose Blip Up-Down Acquisition for Spin And Gradient Echo imaging (BUDA-SAGE), an efficient echo-planar imaging (EPI) sequence for quantitative map**. The acquisition includes multiple T2*-, T2'- and T2-weighted contrasts. We alternate the phase-encoding polarities across the interleaved shots in this multi-shot navigator-free acquisition. A field map estimated from interim reconstructions was incorporated into the joint multi-shot EPI reconstruction with a structured low rank constraint to eliminate geometric distortion. A self-supervised MR-Self2Self (MR-S2S) neural network (NN) was utilized to perform denoising after BUDA reconstruction to boost SNR. Employing Slider encoding allowed us to reach 1 mm isotropic resolution by performing super-resolution reconstruction on BUDA-SAGE volumes acquired with 2 mm slice thickness. Quantitative T2 and T2* maps were obtained using Bloch dictionary matching on the reconstructed echoes. QSM was estimated using nonlinear dipole inversion (NDI) on the gradient echoes. Starting from the estimated R2 and R2* maps, R2' information was derived and used in source separation QSM reconstruction, which provided additional para- and dia-magnetic susceptibility maps. In vivo results demonstrate the ability of BUDA-SAGE to provide whole-brain, distortion-free, high-resolution multi-contrast images and quantitative T2 and T2* maps, as well as yielding para- and dia-magnetic susceptibility maps. Derived quantitative maps showed comparable values to conventional map** methods in phantom and in vivo measurements. BUDA-SAGE acquisition with self-supervised denoising and Slider encoding enabled rapid, distortion-free, whole-brain T2, T2* map** at 1 mm3 isotropic resolution in 90 seconds. △ Less

Submitted 9 September, 2021; v1 submitted 28 August, 2021; originally announced August 2021.

arXiv:2108.05985 [pdf]

doi 10.1002/mrm.29194

Optimized multi-axis spiral projection MR fingerprinting with subspace reconstruction for rapid whole-brain high-isotropic-resolution quantitative imaging

Authors: Xiaozhi Cao, Congyu Liao, Siddharth Srinivasan Iyer, Zhixing Wang, Zihan Zhou, Erpeng Dai, Gilad Liberman, Zi**g Dong, Ting Gong, Hongjian He, Jianhui Zhong, Berkin Bilgic, Kawin Setsompop

Abstract: Purpose: To improve image quality and accelerate the acquisition of 3D MRF. Methods: Building on the multi-axis spiral-projection MRF technique, a subspace reconstruction with locally low rank (LLR) constraint and a modified spiral-projection spatiotemporal encoding scheme termed tiny-golden-angle-shuffling (TGAS) were implemented for rapid whole-brain high-resolution quantitative map**. The LLR… ▽ More Purpose: To improve image quality and accelerate the acquisition of 3D MRF. Methods: Building on the multi-axis spiral-projection MRF technique, a subspace reconstruction with locally low rank (LLR) constraint and a modified spiral-projection spatiotemporal encoding scheme termed tiny-golden-angle-shuffling (TGAS) were implemented for rapid whole-brain high-resolution quantitative map**. The LLR regularization parameter and the number of subspace bases were tuned using retrospective in-vivo data and simulated examinations, respectively. B0 inhomogeneity correction using multi-frequency interpolation was incorporated into the subspace reconstruction to further improve the image quality by mitigating blurring caused by off-resonance effect. Results: The proposed MRF acquisition and reconstruction framework can produce provide high quality 1-mm isotropic whole-brain quantitative maps in a total acquisition time of 1 minute 55 seconds, with higher-quality results than ones obtained from the previous approach in 6 minutes. The comparison of quantitative results indicates that neither the subspace reconstruction nor the TGAS trajectory induce bias for T1 and T2 map**. High quality whole-brain MRF data were also obtained at 0.66-mm isotropic resolution in 4 minutes using the proposed technique, where the increased resolution was shown to improve visualization of subtle brain structures. Conclusion: The proposed TGAS-SPI-MRF with optimized spiral-projection trajectory and subspace reconstruction can enable high-resolution quantitative map** with faster acquisition speed. △ Less

Submitted 12 August, 2021; originally announced August 2021.

Comments: 40 pages, 11 figures, 2 tables

Journal ref: Magnetic Resonance in Medicine, 2022

arXiv:2106.04624 [pdf, other]

SpeechBrain: A General-Purpose Speech Toolkit

Authors: Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, Yoshua Bengio

Abstract: SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing… ▽ More SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines. SpeechBrain achieves competitive or state-of-the-art performance in a wide range of speech benchmarks. It also provides training recipes, pretrained models, and inference scripts for popular speech datasets, as well as tutorials which allow anyone with basic Python proficiency to familiarize themselves with speech technologies. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Comments: Preprint

arXiv:2106.01918 [pdf]

Highly Accelerated EPI with Wave Encoding and Multi-shot Simultaneous Multi-Slice Imaging

Authors: Jae** Cho, Congyu Liao, Qiyuan Tian, Zi**g Zhang, **min Xu, Wei-Ching Lo, Benedikt A. Poser, V. Andrew Stenger, Jason Stockmann, Kawin Setsompop, Berkin Bilgic

Abstract: We introduce wave encoded acquisition and reconstruction techniques for highly accelerated echo planar imaging (EPI) with reduced g-factor penalty and image artifacts. Wave-EPI involves playing sinusoidal gradients during the EPI readout while employing interslice shifts as in blipped-CAIPI acquisitions. This spreads the aliasing in all spatial directions, thereby taking better advantage of 3D coi… ▽ More We introduce wave encoded acquisition and reconstruction techniques for highly accelerated echo planar imaging (EPI) with reduced g-factor penalty and image artifacts. Wave-EPI involves playing sinusoidal gradients during the EPI readout while employing interslice shifts as in blipped-CAIPI acquisitions. This spreads the aliasing in all spatial directions, thereby taking better advantage of 3D coil sensitivity profiles. The amount of voxel spreading that can be achieved by the wave gradients during the short EPI readout period is constrained by the slew rate of the gradient coils and peripheral nerve stimulation (PNS) monitor. We propose to use a half-cycle sinusoidal gradient to increase the amount of voxel spreading that can be achieved while respecting the slew and stimulation constraints. Extending wave-EPI to multi-shot acquisition minimizes geometric distortion and voxel blurring at high in-plane resolution, while structured low-rank regularization mitigates shot-to-shot phase variations without additional navigators. We propose to use different point spread functions (PSFs) for the k-space lines with positive and negative polarities, which are calibrated with a FLEET-based reference scan and allow for addressing gradient imperfections. Wave-EPI provided whole-brain single-shot gradient echo (GE) and multi-shot spin echo (SE) EPI acquisitions at high acceleration factors and was combined with g-Slider slab encoding to boost the SNR level in 1mm isotropic diffusion imaging. Relative to blipped-CAIPI, wave-EPI reduced average and maximum g-factors by up to 1.21- and 1.37-fold, respectively. In conclusion, wave-EPI allows highly accelerated single- and multi-shot EPI with reduced g-factor and artifacts and may facilitate clinical and neuroscientific applications of EPI by improving the spatial and temporal resolution in functional and diffusion imaging. △ Less

Submitted 3 June, 2021; originally announced June 2021.

arXiv:2103.01584 [pdf]

A Practical Framework for ROI Detection in Medical Images -- a case study for hip detection in anteroposterior pelvic radiographs

Authors: Feng-Yu Liu, Chih-Chi Chen, Shann-Ching Chen, Chien-Hung Liao

Abstract: Purpose Automated detection of region of interest (ROI) is a critical step for many medical image applications such as heart ROIs detection in perfusion MRI images, lung boundary detection in chest X-rays, and femoral head detection in pelvic radiographs. Thus, we proposed a practical framework of ROIs detection in medical images, with a case study for hip detection in anteroposterior (AP) pelvic… ▽ More Purpose Automated detection of region of interest (ROI) is a critical step for many medical image applications such as heart ROIs detection in perfusion MRI images, lung boundary detection in chest X-rays, and femoral head detection in pelvic radiographs. Thus, we proposed a practical framework of ROIs detection in medical images, with a case study for hip detection in anteroposterior (AP) pelvic radiographs. Materials and Methods: We conducted a retrospective study which analyzed hip joints seen on 7,399 AP pelvic radiographs from three diverse sources, including 4,290 high resolution radiographs from Chang Gung Memorial Hospital Osteoarthritis, 3,008 low to medium resolution radiographs from Osteoarthritis Initiative, and 101 heterogeneous radiographs from Google image search engine. We presented a deep learning-based ROI detection framework utilizing single-shot multi-box detector (SSD) with ResNet-101 backbone and customized head structure based on the characteristics of the obtained datasets, whose ground truths were labeled by non-medical annotators in a simple graphical interface. Results: Our method achieved average intersection over union (IoU)=0.8115, average confidence=0.9812, and average precision with threshold IoU=0.5 (AP50)=0.9901 in the independent test set, suggesting that the detected hip regions have appropriately covered main features of the hip joints. Conclusion: The proposed approach featured on low-cost labeling, data-driven model design, and heterogeneous data testing. We have demonstrated the feasibility of training a robust hip region detector for AP pelvic radiographs. This practical framework has a promising potential for a wide range of medical image applications. △ Less

Submitted 2 March, 2021; originally announced March 2021.

arXiv:2102.09069 [pdf]

SRDTI: Deep learning-based super-resolution for diffusion tensor MRI

Authors: Qiyuan Tian, Ziyu Li, Qiuyun Fan, Chanon Ngamsombat, Yuxin Hu, Congyu Liao, Fuyixue Wang, Kawin Setsompop, Jonathan R. Polimeni, Berkin Bilgic, Susie Y. Huang

Abstract: High-resolution diffusion tensor imaging (DTI) is beneficial for probing tissue microstructure in fine neuroanatomical structures, but long scan times and limited signal-to-noise ratio pose significant barriers to acquiring DTI at sub-millimeter resolution. To address this challenge, we propose a deep learning-based super-resolution method entitled "SRDTI" to synthesize high-resolution diffusion-w… ▽ More High-resolution diffusion tensor imaging (DTI) is beneficial for probing tissue microstructure in fine neuroanatomical structures, but long scan times and limited signal-to-noise ratio pose significant barriers to acquiring DTI at sub-millimeter resolution. To address this challenge, we propose a deep learning-based super-resolution method entitled "SRDTI" to synthesize high-resolution diffusion-weighted images (DWIs) from low-resolution DWIs. SRDTI employs a deep convolutional neural network (CNN), residual learning and multi-contrast imaging, and generates high-quality results with rich textural details and microstructural information, which are more similar to high-resolution ground truth than those from trilinear and cubic spline interpolation. △ Less

Submitted 17 February, 2021; originally announced February 2021.

arXiv:2009.06600 [pdf, ps, other]

SNR-enhanced diffusion MRI with structure-preserving low-rank denoising in reproducing kernel Hilbert spaces

Authors: Gabriel Ramos-Llordén, Gonzalo Vegas-Sánchez-Ferrero, Congyu Liao, Carl-Fredrik Westin, Kawin Setsompop, Yogesh Rathi

Abstract: Purpose: To introduce, develop, and evaluate a novel denoising technique for diffusion MRI that leverages non-linear redundancy in the data to boost the SNR while preserving signal information. Methods: We exploit non-linear redundancy of the dMRI data by means of Kernel Principal Component Analysis (KPCA), a non-linear generalization of PCAto reproducing kernel Hilbert spaces. By map** the sign… ▽ More Purpose: To introduce, develop, and evaluate a novel denoising technique for diffusion MRI that leverages non-linear redundancy in the data to boost the SNR while preserving signal information. Methods: We exploit non-linear redundancy of the dMRI data by means of Kernel Principal Component Analysis (KPCA), a non-linear generalization of PCAto reproducing kernel Hilbert spaces. By map** the signal to a high-dimensional space, better redundancy is achieved despite nonlinearities in the data thereby enabling better denoising than linear PCA. We implement KPCA with a Gaussian kernel, with parameters automatically selected from knowledge of the noise statistics, and validate it on realistic Monte-Carlo simulations as well as with in-vivo human brain submillimeter resolution dMRI data. We demonstrate KPCA denoising using multi-coil dMRI data also. Results: SNR improvements up to 2.7 X were obtained in real in-vivo datasets denoised with KPCA, in comparison to SNR gains of up to 1.8 X when using state-of-the-art PCA denoising, e.g., Marchenko- Pastur PCA (MPPCA). Compared to gold-standard dataset references created from averaged data, we showed that lower normalized root mean squared error (NRMSE) was achieved with KPCA compared to MPPCA. Statistical analysis of residuals shows that only noise is removed. Improvements in the estimation of diffusion model parameters such as fractional anisotropy, mean diffusivity, and fiber orientation distribution functions (fODFs)were demonstrated. Conclusion:Non-linear redundancy of the dMRI signal can be exploited with KPCA, which allows superior noise reduction/ SNR improvements than state-of-the-art PCA methods, without loss of signal information. △ Less

Submitted 14 September, 2020; originally announced September 2020.

arXiv:2008.07618 [pdf, other]

Incorporating Broad Phonetic Information for Speech Enhancement

Authors: Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-weih Hung, Yu Tsao

Abstract: In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals. Previous studies have also confirmed the benefits of incorporating phonetic information in a speech enhancement (SE) system to achieve better denoising performance. To obtain the phonetic information, we usually prepare a phoneme-based aco… ▽ More In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals. Previous studies have also confirmed the benefits of incorporating phonetic information in a speech enhancement (SE) system to achieve better denoising performance. To obtain the phonetic information, we usually prepare a phoneme-based acoustic model, which is trained using speech waveforms and phoneme labels. Despite performing well in normal noisy conditions, when operating in very noisy conditions, however, the recognized phonemes may be erroneous and thus misguide the SE process. To overcome the limitation, this study proposes to incorporate the broad phonetic class (BPC) information into the SE process. We have investigated three criteria to build the BPC, including two knowledge-based criteria: place and manner of articulatory and one data-driven criterion. Moreover, the recognition accuracies of BPCs are much higher than that of phonemes, thus providing more accurate phonetic information to guide the SE process under very noisy conditions. Experimental results demonstrate that the proposed SE with the BPC information framework can achieve notable performance improvements over the baseline system and an SE system using monophonic information in terms of both speech quality intelligibility on the TIMIT dataset. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: to be published in Interspeech 2020

arXiv:2007.11784 [pdf]

Deep Learning Based Segmentation of Various Brain Lesions for Radiosurgery

Authors: Siang-Ruei Wu, Hao-Yun Chang, Florence T Su, Heng-Chun Liao, Wanju Tseng, Chun-Chih Liao, Feipei Lai, Feng-Ming Hsu, Furen Xiao

Abstract: Semantic segmentation of medical images with deep learning models is rapidly developed. In this study, we benchmarked state-of-the-art deep learning segmentation algorithms on our clinical stereotactic radiosurgery dataset, demonstrating the strengths and weaknesses of these algorithms in a fairly practical scenario. In particular, we compared the model performances with respect to their sampling… ▽ More Semantic segmentation of medical images with deep learning models is rapidly developed. In this study, we benchmarked state-of-the-art deep learning segmentation algorithms on our clinical stereotactic radiosurgery dataset, demonstrating the strengths and weaknesses of these algorithms in a fairly practical scenario. In particular, we compared the model performances with respect to their sampling method, model architecture, and the choice of loss functions, identifying the suitable settings for their applications and shedding light on the possible improvements. △ Less

Submitted 22 July, 2020; originally announced July 2020.

arXiv:2006.10296 [pdf]

Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

Authors: Szu-Wei Fu, Chien-Feng Liao, Tsun-An Hsieh, Kuo-Hsuan Hung, Syu-Siang Wang, Cheng Yu, Heng-Cheng Kuo, Ryandhimas E. Zezario, You-** Li, Shang-Yi Chuang, Yen-Ju Lu, Yu Tsao

Abstract: The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications. Therefore, our study applies a modified Transformer in a speech enhancement task. Specifically, positional encoding in the Transformer may not be necessary for speech enhancement, and hence, it is replaced by convolutional layers. To fur… ▽ More The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications. Therefore, our study applies a modified Transformer in a speech enhancement task. Specifically, positional encoding in the Transformer may not be necessary for speech enhancement, and hence, it is replaced by convolutional layers. To further improve the perceptual evaluation of the speech quality (PESQ) scores of enhanced speech, the L_1 pre-trained Transformer is fine-tuned using a MetricGAN framework. The proposed MetricGAN can be treated as a general post-processing module to further boost the objective scores of interest. The experiments were conducted using the data sets provided by the organizer of the Deep Noise Suppression (DNS) challenge. Experimental results demonstrated that the proposed system outperformed the challenge baseline, in both subjective and objective evaluations, with a large margin. △ Less

Submitted 3 March, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: Accepted by APSIPA 2020

arXiv:2005.13201 [pdf, other]

Co-Heterogeneous and Adaptive Segmentation from Multi-Source and Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion Segmentation

Authors: Ashwin Raju, Chi-Tung Cheng, Yunakai Huo, **zheng Cai, Junzhou Huang, **g Xiao, Le Lu, ChienHuang Liao, Adam P Harrison

Abstract: In medical imaging, organ/pathology segmentation models trained on current publicly available and fully-annotated datasets usually do not well-represent the heterogeneous modalities, phases, pathologies, and clinical scenarios encountered in real environments. On the other hand, there are tremendous amounts of unlabelled patient imaging scans stored by many modern clinical centers. In this work, w… ▽ More In medical imaging, organ/pathology segmentation models trained on current publicly available and fully-annotated datasets usually do not well-represent the heterogeneous modalities, phases, pathologies, and clinical scenarios encountered in real environments. On the other hand, there are tremendous amounts of unlabelled patient imaging scans stored by many modern clinical centers. In this work, we present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe), which only requires a small labeled cohort of single phase imaging data to adapt to any unlabeled cohort of heterogenous multi-phase data with possibly new clinical scenarios and pathologies. To do this, we propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling. We also introduce co-heterogeneous training, which is a novel integration of co-training and hetero modality learning. We have evaluated CHASe using a clinically comprehensive and challenging dataset of multi-phase computed tomography (CT) imaging studies (1147 patients and 4577 3D volumes). Compared to previous state-of-the-art baselines, CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2\% \sim 9.4\%$, depending on the phase combinations: e.g., from $84.6\%$ to $94.0\%$ on non-contrast CTs. △ Less

Submitted 19 July, 2021; v1 submitted 27 May, 2020; originally announced May 2020.

Comments: 23 pages, 8 figures

arXiv:2005.12209 [pdf, other]

JSSR: A Joint Synthesis, Segmentation, and Registration System for 3D Multi-Modal Image Alignment of Large-scale Pathological CT Scans

Authors: Fengze Liu, **zheng Cai, Yuankai Huo, Chi-Tung Cheng, Ashwin Raju, Dakai **, **g Xiao, Alan Yuille, Le Lu, ChienHung Liao, Adam P Harrison

Abstract: Multi-modal image registration is a challenging problem that is also an important clinical task for many real applications and scenarios. As a first step in analysis, deformable registration among different image modalities is often required in order to provide complementary visual information. During registration, semantic information is key to match homologous points and pixels. Nevertheless, ma… ▽ More Multi-modal image registration is a challenging problem that is also an important clinical task for many real applications and scenarios. As a first step in analysis, deformable registration among different image modalities is often required in order to provide complementary visual information. During registration, semantic information is key to match homologous points and pixels. Nevertheless, many conventional registration methods are incapable in capturing high-level semantic anatomical dense correspondences. In this work, we propose a novel multi-task learning system, JSSR, based on an end-to-end 3D convolutional neural network that is composed of a generator, a registration and a segmentation component. The system is optimized to satisfy the implicit constraints between different tasks in an unsupervised manner. It first synthesizes the source domain images into the target domain, then an intra-modal registration is applied on the synthesized images and target images. The segmentation module are then applied on the synthesized and target images, providing additional cues based on semantic correspondences. The supervision from another fully-annotated dataset is used to regularize the segmentation. We extensively evaluate JSSR on a large-scale medical image dataset containing 1,485 patient CT imaging studies of four different contrast phases (i.e., 5,940 3D CT scans with pathological livers) on the registration, segmentation and synthesis tasks. The performance is improved after joint training on the registration and segmentation tasks by 0.9% and 1.9% respectively compared to a highly competitive and accurate deep learning baseline. The registration also consistently outperforms conventional state-of-the-art multi-modal registration methods. △ Less

Submitted 17 July, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

Comments: accepted to ECCV 2020

arXiv:1911.07219 [pdf]

Scan-specific, Parameter-free Artifact Reduction in K-space (SPARK)

Authors: Onur Beker, Congyu Liao, Jae** Cho, Zi**g Zhang, Kawin Setsompop, Berkin Bilgic

Abstract: We propose a convolutional neural network (CNN) approach that works synergistically with physics-based reconstruction methods to reduce artifacts in accelerated MRI. Given reconstructed coil k-spaces, our network predicts a k-space correction term for each coil. This is done by matching the difference between the acquired autocalibration lines and their erroneous reconstructions, and generalizing… ▽ More We propose a convolutional neural network (CNN) approach that works synergistically with physics-based reconstruction methods to reduce artifacts in accelerated MRI. Given reconstructed coil k-spaces, our network predicts a k-space correction term for each coil. This is done by matching the difference between the acquired autocalibration lines and their erroneous reconstructions, and generalizing this error term over the entire k-space. Application of this approach on existing reconstruction methods show that SPARK suppresses reconstruction artifacts at high acceleration, while preserving and improving on detail in moderate acceleration rates where existing reconstruction algorithms already perform well; indicating robustness. Introduction Parallel △ Less

Submitted 17 November, 2019; originally announced November 2019.

Comments: 5 figures

arXiv:1910.14211 [pdf]

Accelerated spin-echo fMRI using Multisection Excitation by Simultaneous Spin-echo Interleaving (MESSI) with complex-encoded generalized SLIce Dithered Enhanced Resolution (cgSlider) Simultaneous Multi-Slice Echo-Planar Imaging

Authors: SoHyun Han, Congyu Liao, Mary Kate Manhard, Daniel Joseph Park, Berkin Bilgic, Merlin J. Fair, Fuyixue Wang, Anna I. Blazejewska, William A. Grissom, Jonathan R. Polimeni, Kawin Setsompop

Abstract: Spin-echo functional MRI (SE-fMRI) has the potential to improve spatial specificity when compared to gradient-echo fMRI. However, high spatiotemporal resolution SE-fMRI with large slice-coverage is challenging as SE-fMRI requires a long echo time (TE) to generate blood oxygenation level-dependent (BOLD) contrast, leading to long repetition times (TR). The aim of this work is to develop an acquisit… ▽ More Spin-echo functional MRI (SE-fMRI) has the potential to improve spatial specificity when compared to gradient-echo fMRI. However, high spatiotemporal resolution SE-fMRI with large slice-coverage is challenging as SE-fMRI requires a long echo time (TE) to generate blood oxygenation level-dependent (BOLD) contrast, leading to long repetition times (TR). The aim of this work is to develop an acquisition method that enhances the slice-coverage of SE-fMRI at high spatiotemporal resolution. An acquisition scheme was developed entitled Multisection Excitation by Simultaneous Spin-echo Interleaving (MESSI) with complex-encoded generalized SLIce Dithered Enhanced Resolution (cgSlider). MESSI utilizes the dead-time during the long TE by interleaving the excitation and readout of two slices to enable 2x slice-acceleration, while cgSlider utilizes the stable temporal background phase in SE-fMRI to encode and decode two adjacent slices simultaneously with a phase-constrained reconstruction method. The proposed cgSlider-MESSI was also combined with Simultaneous Multi-Slice (SMS) to achieve further slice-acceleration. This combined approach was used to achieve 1.5mm isotropic whole-brain SE-fMRI with a temporal resolution of 1.5s and was evaluated using sensory stimulation and breath-hold tasks at 3T. Compared to conventional SE-SMS, cgSlider-MESSI-SMS provides four-fold increase in slice-coverage for the same TR, with comparable temporal signal-to-noise ratio. Corresponding fMRI activation from cgSlider-MESSI-SMS for both fMRI tasks were consistent with those from conventional SE-SMS. Overall, cgSlider-MESSI-SMS achieved a 32x encoding-acceleration by combining RinplanexMBxcgSliderxMESSI=4x2x2x2. High-quality, high-resolution whole-brain SE-fMRI was acquired at a short TR using cgSlider-MESSI-SMS. △ Less

Submitted 30 October, 2019; originally announced October 2019.

Comments: 38 pages, 9 figures, ISMRM2019 #1165

arXiv:1909.12999 [pdf]

doi 10.1002/mrm.28872

Efficient T2 map** with Blip-up/down EPI and gSlider-SMS (T2-BUDA-gSlider)

Authors: Xiaozhi Cao, Congyu Liao, Zi**g Zhang, Siddharth Srinivasan Iyer, Kang Wang, Hongjian He, Huafeng Liu, Kawin Setsompop, Jianhui Zhong, Berkin Bilgic

Abstract: Purpose: To rapidly obtain high isotropic-resolution T2 maps with whole-brain coverage and high geometric fidelity. Methods: A T2 blip-up/down echo planar imaging (EPI) acquisition with generalized Slice-dithered enhanced resolution (T2-BUDA-gSlider) is proposed. A radiofrequency (RF)-encoded multi-slab spin-echo EPI acquisition with multiple echo times (TEs) was developed to obtain high SNR eff… ▽ More Purpose: To rapidly obtain high isotropic-resolution T2 maps with whole-brain coverage and high geometric fidelity. Methods: A T2 blip-up/down echo planar imaging (EPI) acquisition with generalized Slice-dithered enhanced resolution (T2-BUDA-gSlider) is proposed. A radiofrequency (RF)-encoded multi-slab spin-echo EPI acquisition with multiple echo times (TEs) was developed to obtain high SNR efficiency with reduced repetition time (TR). This was combined with an interleaved 2-shot EPI acquisition using blip-up/down phase encoding. An estimated field map was incorporated into the joint multi-shot EPI reconstruction with a structured low rank constraint to achieve distortion-free and robust reconstruction for each slab without navigation. A Bloch simulated subspace model was integrated into gSlider reconstruction and utilized for T2 quantification. Results: In vivo results demonstrated that the T2 values estimated by the proposed method were consistent with gold standard spin-echo acquisition. Compared to the reference 3D fast spin echo (FSE) images, distortion caused by off-resonance and eddy current effects were effectively mitigated. Conclusion: BUDA-gSlider SE-EPI acquisition and gSlider-subspace joint reconstruction enabled distortion-free whole-brain T2 map** in 2 min at ~1 mm3 isotropic resolution, which could bring significant benefits to related clinical and neuroscience applications. △ Less

Submitted 20 September, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

Comments: 20 pages, 7 figures

Journal ref: Magnetic Resonance in Medicine (2020)

arXiv:1909.07925 [pdf, other]

High-fidelity, accelerated whole-brain submillimeter in-vivo diffusion MRI using gSlider-Spherical Ridgelets (gSlider-SR)

Authors: Gabriel Ramos-Llordén, Lipeng Ning, Congyu Liao, Rinat Mukhometzianov, Oleg Michailovich, Kawin Setsompop, Yogesh Rathi

Abstract: Purpose: To develop an accelerated, robust, and accurate diffusion MRI acquisition and reconstruction technique for submillimeter whole human brain in-vivo scan on a clinical scanner. Methods: We extend the ultra-high resolution diffusion MRI acquisition technique, gSlider, by allowing under-sampling in q-space and Radio-Frequency (RF)-encoded data, thereby accelerating the total acquisition tim… ▽ More Purpose: To develop an accelerated, robust, and accurate diffusion MRI acquisition and reconstruction technique for submillimeter whole human brain in-vivo scan on a clinical scanner. Methods: We extend the ultra-high resolution diffusion MRI acquisition technique, gSlider, by allowing under-sampling in q-space and Radio-Frequency (RF)-encoded data, thereby accelerating the total acquisition time of conventional gSlider. The novel method, termed gSlider-SR, compensates for the lack of acquired information by exploiting redundancy in the dMRI data using a basis of Spherical Ridgelets (SR), while simultaneously enhancing the signal-to-noise ratio. Using Monte-Carlo simulation with realistic noise levels and several acquisitions of in-vivo human brain dMRI data (acquired on a Siemens Prisma 3T scanner), we demonstrate the efficacy of our method using several quantitative metrics. Results: For high-resolution dMRI data with realistic noise levels (synthetically added), we show that gSlider-SR can reconstruct high-quality dMRI data at different acceleration factors preserving both signal and angular information. With in-vivo data, we demonstrate that gSlider-SR can accurately reconstruct 860 $μm$ diffusion MRI data (64 diffusion directions at b = 2000 $s/ {mm}^2$), at comparable quality as that obtained with conventional gSlider with four averages, thereby providing an eight-fold reduction in scan time (from 1 h 20 min to 10 min). Conclusion: gSlider-SR enables whole-brain high angular resolution dMRI at a submillimeter spatial resolution with a dramatically reduced acquisition time, making it feasible to use the proposed scheme on existing clinical scanners. △ Less

Submitted 4 March, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

arXiv:1909.02511 [pdf, other]

CT Data Curation for Liver Patients: Phase Recognition in Dynamic Contrast-Enhanced CT

Authors: Bo Zhou, Adam P. Harrison, Jiawen Yao, Chi-Tung Cheng, **g Xiao, Chien-Hung Liao, Le Lu

Abstract: As the demand for more descriptive machine learning models grows within medical imaging, bottlenecks due to data paucity will exacerbate. Thus, collecting enough large-scale data will require automated tools to harvest data/label pairs from messy and real-world datasets, such as hospital PACS. This is the focus of our work, where we present a principled data curation tool to extract multi-phase CT… ▽ More As the demand for more descriptive machine learning models grows within medical imaging, bottlenecks due to data paucity will exacerbate. Thus, collecting enough large-scale data will require automated tools to harvest data/label pairs from messy and real-world datasets, such as hospital PACS. This is the focus of our work, where we present a principled data curation tool to extract multi-phase CT liver studies and identify each scan's phase from a real-world and heterogenous hospital PACS dataset. Emulating a typical deployment scenario, we first obtain a set of noisy labels from our institutional partners that are text mined using simple rules from DICOM tags. We train a deep learning system, using a customized and streamlined 3D SE architecture, to identify non-contrast, arterial, venous, and delay phase dynamic CT liver scans, filtering out anything else, including other types of liver contrast studies. To exploit as much training data as possible, we also introduce an aggregated cross entropy loss that can learn from scans only identified as "contrast". Extensive experiments on a dataset of 43K scans of 7680 patient imaging studies demonstrate that our 3DSE architecture, armed with our aggregated loss, can achieve a mean F1 of 0.977 and can correctly harvest up to 92.7% of studies, which significantly outperforms the text-mined and standard-loss approach, and also outperforms other, and more complex, model architectures. △ Less

Submitted 27 September, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

Comments: 11 pages, accepted by 2019 MICCAI - Medical Image Learning with Less Labels and Imperfect Data Workshop

arXiv:1909.02448 [pdf, other]

High-Fidelity State-of-Charge Estimation of Li-Ion Batteries Using Machine Learning

Authors: Weizhong Wang, Nicholas W. Brady, Chenyao Liao, Youssef A. Fahmy, Ephrem Chemali, Alan C. West, Matthias Preindl

Abstract: This paper proposes a way to augment the existing machine learning algorithm applied to state-of-charge estimation by introducing a form of pulse injection to the running battery cells. It is believed that the information contained in the pulse responses can be interpreted by a machine learning algorithm whereas other techniques are difficult to decode due to the nonlinearity. The sensitivity anal… ▽ More This paper proposes a way to augment the existing machine learning algorithm applied to state-of-charge estimation by introducing a form of pulse injection to the running battery cells. It is believed that the information contained in the pulse responses can be interpreted by a machine learning algorithm whereas other techniques are difficult to decode due to the nonlinearity. The sensitivity analysis of the amplitude of the current pulse is given through simulation, allowing the researchers to select the appropriate current level with respect to the desired accuracy improvement. A multi-layer feedforward neural networks is trained to acquire the nonlinear relationship between the pulse train and the ground-truth SoC. The experimental data is trained and the results are shown to be promising with less than 2\% SoC estimation error using layer sizes in the range of 10 - 10,000 trained in 0 - 1 million epochs. The testing procedure specifically designed for the proposed technique is explained and provided. The implementation of the proposed strategy is also discussed. The detailed system layout to perform the augmented SoC estimation integrated in the existing active balancing hardware has also been given. △ Less

Submitted 30 August, 2019; originally announced September 2019.

Comments: 8 pages, 12 figures

arXiv:1908.00983 [pdf]

doi 10.1117/12.2527183

Highly efficient MRI through multi-shot echo planar imaging

Authors: Congyu Liao, Xiaozhi Cao, Jae** Cho, Zi**g Zhang, Kawin Setsompop, Berkin Bilgic

Abstract: Multi-shot echo planar imaging (msEPI) is a promising approach to achieve high in-plane resolution with high sampling efficiency and low T2* blurring. However, due to the geometric distortion, shot-to-shot phase variations and potential subject motion, msEPI continues to be a challenge in MRI. In this work, we introduce acquisition and reconstruction strategies for robust, high-quality msEPI witho… ▽ More Multi-shot echo planar imaging (msEPI) is a promising approach to achieve high in-plane resolution with high sampling efficiency and low T2* blurring. However, due to the geometric distortion, shot-to-shot phase variations and potential subject motion, msEPI continues to be a challenge in MRI. In this work, we introduce acquisition and reconstruction strategies for robust, high-quality msEPI without phase navigators. We propose Blip Up-Down Acquisition (BUDA) using interleaved blip-up and -down phase encoding, and incorporate B0 forward-modeling into Hankel structured low-rank model to enable distortion- and navigator-free msEPI. We improve the acquisition efficiency and reconstruction quality by incorporating simultaneous multi-slice acquisition and virtual-coil reconstruction into the BUDA technique. We further combine BUDA with the novel RF-encoded gSlider acquisition, dubbed BUDA-gSlider, to achieve rapid high isotropic-resolution MRI. Deploying BUDA-gSlider with model-based reconstruction allows for distortion-free whole-brain 1mm isotropic T2 map** in about 1 minute. It also provides whole-brain 1mm isotropic diffusion imaging with high geometric fidelity and SNR efficiency. We finally incorporate sinusoidal wave gradients during the EPI readout to better use coil sensitivity encoding with controlled aliasing. △ Less

Submitted 2 August, 2019; originally announced August 2019.

Comments: 13 pages, 10 figures

Journal ref: Proceedings Volume 11138, Wavelets and Sparsity XVIII; 1113818 (2019)

arXiv:1907.13261 [pdf, other]

Robust Autocalibrated Structured Low-Rank EPI Ghost Correction

Authors: Rodrigo A. Lobos, W. Scott Hoge, Ahsan Javed, Congyu Liao, Kawin Setsompop, Krishna S. Nayak, Justin P. Haldar

Abstract: Purpose: We propose and evaluate a new structured low-rank method for EPI ghost correction called Robust Autocalibrated LORAKS (RAC-LORAKS). The method can be used to suppress EPI ghosts arising from the differences between different readout gradient polarities and/or the differences between different shots. It does not require conventional EPI navigator signals, and is robust to imperfect autocal… ▽ More Purpose: We propose and evaluate a new structured low-rank method for EPI ghost correction called Robust Autocalibrated LORAKS (RAC-LORAKS). The method can be used to suppress EPI ghosts arising from the differences between different readout gradient polarities and/or the differences between different shots. It does not require conventional EPI navigator signals, and is robust to imperfect autocalibration data. Methods: Autocalibrated LORAKS is a previous structured low-rank method for EPI ghost correction that uses GRAPPA-type autocalibration data to enable high-quality ghost correction. This method works well when the autocalibration data is pristine, but performance degrades substantially when the autocalibration information is imperfect. RAC-LORAKS generalizes Autocalibrated LORAKS in two ways. First, it does not completely trust the information from autocalibration data, and instead considers the autocalibration and EPI data simultaneously when estimating low-rank matrix structure. And second, it uses complementary information from the autocalibration data to improve EPI reconstruction in a multi-contrast joint reconstruction framework. RAC-LORAKS is evaluated using simulations and in vivo data, including comparisons to state-of-the-art methods. Results: RAC-LORAKS is demonstrated to have good ghost elimination performance compared to state-of-the-art methods in several complicated EPI acquisition scenarios (including gradient-echo brain imaging, diffusion-encoded brain imaging, and cardiac imaging). Conclusion: RAC-LORAKS provides effective suppression of EPI ghosts and is robust to imperfect autocalibration data. △ Less

Submitted 1 October, 2020; v1 submitted 30 July, 2019; originally announced July 2019.

arXiv:1905.04874 [pdf, other]

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement

Authors: Szu-Wei Fu, Chien-Feng Liao, Yu Tsao, Shou-De Lin

Abstract: Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores. To overcome this issue, we propose a novel MetricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics. Mor… ▽ More Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores. To overcome this issue, we propose a novel MetricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics. Moreover, based on MetricGAN, the metric scores of the generated data can also be arbitrarily specified by users. We tested the proposed MetricGAN on a speech enhancement task, which is particularly suitable to verify the proposed approach because there are multiple metrics measuring different aspects of speech signals. Moreover, these metrics are generally complex and could not be fully optimized by Lp or conventional adversarial losses. △ Less

Submitted 13 May, 2019; originally announced May 2019.

Comments: Accepted by Thirty-sixth International Conference on Machine Learning (ICML) 2019

arXiv:1905.01898 [pdf]

doi 10.1109/LSP.2019.2953810

Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

Authors: Szu-Wei Fu, Chien-Feng Liao, Yu Tsao

Abstract: Utilizing a human-perception-related objective function to train a speech enhancement model has become a popular topic recently. The main reason is that the conventional mean squared error (MSE) loss cannot represent auditory perception well. One of the typical hu-man-perception-related metrics, which is the perceptual evaluation of speech quality (PESQ), has been proven to provide a high correlat… ▽ More Utilizing a human-perception-related objective function to train a speech enhancement model has become a popular topic recently. The main reason is that the conventional mean squared error (MSE) loss cannot represent auditory perception well. One of the typical hu-man-perception-related metrics, which is the perceptual evaluation of speech quality (PESQ), has been proven to provide a high correlation to the quality scores rated by humans. Owing to its complex and non-differentiable properties, however, the PESQ function may not be used to optimize speech enhancement models directly. In this study, we propose optimizing the enhancement model with an approximated PESQ function, which is differentiable and learned from the training data. The experimental results show that the learned surrogate function can guide the enhancement model to further boost the PESQ score (in-crease of 0.18 points compared to the results trained with MSE loss) and maintain the speech intelligibility. △ Less

Submitted 14 November, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

Comments: Accepted by IEEE Signal Processing Letters (SPL)

arXiv:1904.13142 [pdf, other]

Incorporating Symbolic Sequential Modeling for Speech Enhancement

Authors: Chien-Feng Liao, Yu Tsao, Xugang Lu, Hisashi Kawai

Abstract: In a noisy environment, a lossy speech signal can be automatically restored by a listener if he/she knows the language well. That is, with the built-in knowledge of a "language model", a listener may effectively suppress noise interference and retrieve the target speech signals. Accordingly, we argue that familiarity with the underlying linguistic content of spoken utterances benefits speech enhan… ▽ More In a noisy environment, a lossy speech signal can be automatically restored by a listener if he/she knows the language well. That is, with the built-in knowledge of a "language model", a listener may effectively suppress noise interference and retrieve the target speech signals. Accordingly, we argue that familiarity with the underlying linguistic content of spoken utterances benefits speech enhancement (SE) in noisy environments. In this study, in addition to the conventional modeling for learning the acoustic noisy-clean speech map**, an abstract symbolic sequential modeling is incorporated into the SE framework. This symbolic sequential modeling can be regarded as a "linguistic constraint" in learning the acoustic noisy-clean speech map** function. In this study, the symbolic sequences for acoustic signals are obtained as discrete representations with a Vector Quantized Variational Autoencoder algorithm. The obtained symbols are able to capture high-level phoneme-like content from speech signals. The experimental results demonstrate that the proposed framework can obtain notable performance improvement in terms of perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) on the TIMIT dataset. △ Less

Submitted 1 July, 2019; v1 submitted 30 April, 2019; originally announced April 2019.

Comments: Accepted to Interspeech 2019

arXiv:1812.08067 [pdf]

doi 10.1002/mrm.27812

Ultrashort Echo Time Magnetic Resonance Fingerprinting (UTE-MRF) for Simultaneous Quantification of Long and Ultrashort T2 Tissues

Authors: Qing Li, Xiaozhi Cao, Huihui Ye, Congyu Liao, Hongjian He, Jianhui Zhong

Abstract: Purpose: To demonstrate an ultrashort echo time magnetic resonance fingerprinting (UTE-MRF) method that can simultaneously quantify tissue relaxometries for muscle and bone in musculoskeletal systems and tissue components in brain and therefore can synthesize pseudo-CT images. Methods: A FISP-MRF sequence with half pulse excitation and half spoke radial acquisition was designed to sample fast T2… ▽ More Purpose: To demonstrate an ultrashort echo time magnetic resonance fingerprinting (UTE-MRF) method that can simultaneously quantify tissue relaxometries for muscle and bone in musculoskeletal systems and tissue components in brain and therefore can synthesize pseudo-CT images. Methods: A FISP-MRF sequence with half pulse excitation and half spoke radial acquisition was designed to sample fast T2 decay signals. Sinusoidal echo time (TE) pattern was applied to enhance MRF sensitivity for tissues with short and ultrashort T2 values. The performance of UTE-MRF was evaluated via simulations, phantoms, and in vivo experiments. Results: A minimal TE of 0.05 ms was achieved in UTE-MRF. Simulations indicated that extension of TE sampling increased T2 quantification accuracy in cortical bone and tendon, and had little impact on long T2 muscle quantifications. For a rubber phantom, an average T1/T2 of 162/1.07 ms from UTE-MRF were compared well with gold standard T2 of 190 ms from IR-UTE and T2* of 1.03 ms from UTE sequence. For a long T2 agarose phantom, the linear regression slope between UTE-MRF and gold standard was 1.07 (R2=0.991) for T1 and 1.04 (R2=0.994) for T2. In vivo experiments showed the detection of cortical bone and Achilles tendon, where the averaged T2 was respectively 1.0 ms and 15 ms. Scalp images were in good agreement with CT. Conclusion: UTE-MRF with sinusoidal TE variations shows its capability to produce pseudo-CT images and simultaneously output T1, T2, proton density, and B0 maps for tissues with long T2 and short/ultrashort T2 in the brain and musculoskeletal system. △ Less

Submitted 27 March, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

Comments: 32 pages, 12 figures, 1 table

Journal ref: Magnetic Resonance in Medicine (2019)

arXiv:1811.05473 [pdf]

doi 10.1002/mrm.27899

High-fidelity, high-isotropic resolution diffusion imaging through gSlider acquisition with B1+ & T1 corrections and integrated ΔB0/Rx shim array

Authors: Congyu Liao, Jason Stockmann, Qiyuan Tian, Berkin Bilgic, Nicolas S. Arango, Mary Kate Manhard, William A. Grissom, Lawrence L. Wald, Kawin Setsompop

Abstract: Purpose: B1+ and T1 corrections and dynamic multi-coil shimming approaches were proposed to improve the fidelity of high isotropic resolution Generalized slice dithered enhanced resolution (gSlider) diffusion imaging. Methods: An extended reconstruction incorporating B1+ inhomogeneity and T1 recovery information was developed to mitigate slab-boundary artifacts in short-TR gSlider acquisitions. Sl… ▽ More Purpose: B1+ and T1 corrections and dynamic multi-coil shimming approaches were proposed to improve the fidelity of high isotropic resolution Generalized slice dithered enhanced resolution (gSlider) diffusion imaging. Methods: An extended reconstruction incorporating B1+ inhomogeneity and T1 recovery information was developed to mitigate slab-boundary artifacts in short-TR gSlider acquisitions. Slab-by-slab dynamic B0 shimming using a multi-coil integrated ΔB0/Rx shim-array, and high in-plane acceleration (Rinplane=4) achieved with virtual-coil GRAPPA were also incorporated into a 1 mm isotropic resolution gSlider acquisition/reconstruction framework to achieve an 8-11 fold reduction in geometric distortion compared to single-shot EPI. Results: The slab-boundary artifacts were alleviated by the proposed B1+ and T1 corrections compared to the standard gSlider reconstruction pipeline for short-TR acquisitions. Dynamic shimming provided >50% reduction in geometric distortion compared to conventional global 2nd order shimming. 1 mm isotropic resolution diffusion data show that the typically problematic temporal and frontal lobes of the brain can be imaged with high geometric fidelity using dynamic shimming. Conclusions: The proposed B1+ and T1 corrections and local-field control substantially improved the fidelity of high isotropic resolution diffusion imaging, with reduced slab-boundary artifacts and geometric distortion compared to conventional gSlider acquisition and reconstruction. This enabled high-fidelity whole-brain 1 mm isotropic diffusion imaging with 64 diffusion-directions in 20 minutes using a 3T clinical scanner. △ Less

Submitted 26 March, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

Comments: 7 figures

Journal ref: Magnetic Resonance in Medicine (2019)

arXiv:1810.02027 [pdf]

Polar Feature Based Deep Architectures for Automatic Modulation Classification Considering Channel Fading

Authors: Chieh-Fang Teng, Ching-Chun Liao, Chun-Hsiang Chen, An-Yeu Wu

Abstract: To develop intelligent receivers, automatic modulation classification (AMC) plays an important role for better spectrum utilization. The emerging deep learning (DL) technique has received much attention in AMC due to its superior performance in classifying data with deep structure. In this work, a novel polar-based deep learning architecture with channel compensation network (CCN) is proposed. Our… ▽ More To develop intelligent receivers, automatic modulation classification (AMC) plays an important role for better spectrum utilization. The emerging deep learning (DL) technique has received much attention in AMC due to its superior performance in classifying data with deep structure. In this work, a novel polar-based deep learning architecture with channel compensation network (CCN) is proposed. Our test results show that learning features from polar domain (r-theta) can improve recognition accuracy by 5% and reduce training overhead by 48%. Besides, the proposed CCN is also robust to channel fading, such as amplitude and phase offsets, and can improve the recognition accuracy by 14% under practical channel environments. △ Less

Submitted 7 October, 2018; v1 submitted 3 October, 2018; originally announced October 2018.

Comments: 5 pages, accepted by the 2018 Sixth IEEE Global Conference on Signal and Information Processing

arXiv:1808.02814 [pdf]

Highly Accelerated Multishot EPI through Synergistic Machine Learning and Joint Reconstruction

Authors: Berkin Bilgic, Itthi Chatnuntawech, Mary Kate Manhard, Qiyuan Tian, Congyu Liao, Stephen F. Cauley, Susie Y. Huang, Jonathan R. Polimeni, Lawrence L. Wald, Kawin Setsompop

Abstract: Purpose: To introduce a combined machine learning (ML) and physics-based image reconstruction framework that enables navigator-free, highly accelerated multishot echo planar imaging (msEPI), and demonstrate its application in high-resolution structural and diffusion imaging. Methods: Singleshot EPI is an efficient encoding technique, but does not lend itself well to high-resolution imaging due t… ▽ More Purpose: To introduce a combined machine learning (ML) and physics-based image reconstruction framework that enables navigator-free, highly accelerated multishot echo planar imaging (msEPI), and demonstrate its application in high-resolution structural and diffusion imaging. Methods: Singleshot EPI is an efficient encoding technique, but does not lend itself well to high-resolution imaging due to severe distortion artifacts and blurring. While msEPI can mitigate these artifacts, high-quality msEPI has been elusive because of phase mismatch arising from shot-to-shot variations which preclude the combination of the multiple-shot data into a single image. We employ deep learning to obtain an interim image with minimal artifacts, which permits estimation of image phase variations due to shot-to-shot changes. These variations are then included in a Joint Virtual Coil Sensitivity Encoding (JVC-SENSE) reconstruction to utilize data from all shots and improve upon the ML solution. Results: Our combined ML + physics approach enabled Rinplane x MultiBand (MB) = 8x2-fold acceleration using 2 EPI-shots for multi-echo imaging, so that whole-brain T2 and T2* parameter maps could be derived from an 8.3 sec acquisition at 1x1x3mm3 resolution. This has also allowed high-resolution diffusion imaging with high geometric fidelity using 5-shots at Rinplane x MB = 9x2-fold acceleration. To make these possible, we extended the state-of-the-art MUSSELS reconstruction technique to Simultaneous MultiSlice (SMS) encoding and used it as an input to our ML network. Conclusion: Combination of ML and JVC-SENSE enabled navigator-free msEPI at higher accelerations than previously possible while using fewer shots, with reduced vulnerability to poor generalizability and poor acceptance of end-to-end ML approaches. △ Less

Submitted 24 March, 2019; v1 submitted 8 August, 2018; originally announced August 2018.

arXiv:1807.07501 [pdf, other]

Noise Adaptive Speech Enhancement using Domain Adversarial Training

Authors: Chien-Feng Liao, Yu Tsao, Hung-Yi Lee, Hsin-Min Wang

Abstract: In this study, we propose a novel noise adaptive speech enhancement (SE) system, which employs a domain adversarial training (DAT) approach to tackle the issue of a noise type mismatch between the training and testing conditions. Such a mismatch is a critical problem in deep-learning-based SE systems. A large mismatch may cause a serious performance degradation to the SE performance. Because we ge… ▽ More In this study, we propose a novel noise adaptive speech enhancement (SE) system, which employs a domain adversarial training (DAT) approach to tackle the issue of a noise type mismatch between the training and testing conditions. Such a mismatch is a critical problem in deep-learning-based SE systems. A large mismatch may cause a serious performance degradation to the SE performance. Because we generally use a well-trained SE system to handle various unseen noise types, a noise type mismatch commonly occurs in real-world scenarios. The proposed noise adaptive SE system contains an encoder-decoder-based enhancement model and a domain discriminator model. During adaptation, the DAT approach encourages the encoder to produce noise-invariant features based on the information from the discriminator model and consequentially increases the robustness of the enhancement model to unseen noise types. Herein, we regard stationary noises as the source domain (with the ground truth of clean speech) and non-stationary noises as the target domain (without the ground truth). We evaluated the proposed system on TIMIT sentences. The experiment results show that the proposed noise adaptive SE system successfully provides significant improvements in PESQ (19.0%), SSNR (39.3%), and STOI (27.0%) over the SE system without an adaptation. △ Less

Submitted 1 July, 2019; v1 submitted 19 July, 2018; originally announced July 2018.

Comments: Accepted to Interspeech 2019

arXiv:1710.08062 [pdf, ps, other]

doi 10.1109/TMI.2018.2873704

Optimal Experiment Design for Magnetic Resonance Fingerprinting: Cramér-Rao Bound Meets Spin Dynamics

Authors: Bo Zhao, Justin P. Haldar, Congyu Liao, Dan Ma, Yun Jiang, Mark A. Griswold, Kawin Setsompop, Lawrence L. Wald

Abstract: Magnetic resonance (MR) fingerprinting is a new quantitative imaging paradigm, which simultaneously acquires multiple MR tissue parameter maps in a single experiment. In this paper, we present an estimation-theoretic framework to perform experiment design for MR fingerprinting. Specifically, we describe a discrete-time dynamic system to model spin dynamics, and derive an estimation-theoretic bound… ▽ More Magnetic resonance (MR) fingerprinting is a new quantitative imaging paradigm, which simultaneously acquires multiple MR tissue parameter maps in a single experiment. In this paper, we present an estimation-theoretic framework to perform experiment design for MR fingerprinting. Specifically, we describe a discrete-time dynamic system to model spin dynamics, and derive an estimation-theoretic bound, i.e., the Cramer-Rao bound (CRB), to characterize the signal-to-noise ratio (SNR) efficiency of an MR fingerprinting experiment. We then formulate an optimal experiment design problem, which determines a sequence of acquisition parameters to encode MR tissue parameters with the maximal SNR efficiency, while respecting the physical constraints and other constraints from the image decoding/reconstruction process. We evaluate the performance of the proposed approach with numerical simulations, phantom experiments, and in vivo experiments. We demonstrate that the optimized experiments substantially reduce data acquisition time and/or improve parameter estimation. For example, the optimized experiments achieve about a factor of two improvement in the accuracy of $T_2$ maps, while kee** similar or slightly better accuracy of $T_1$ maps. Finally, as a remarkable observation, we find that the sequence of optimized acquisition parameters appears to be highly structured rather than randomly/pseudo-randomly varying as is prescribed in the conventional MR fingerprinting experiments. △ Less

Submitted 1 October, 2018; v1 submitted 22 October, 2017; originally announced October 2017.

Comments: Manuscript accepted by the IEEE Transactions on Medical Imaging (18 pages, 17 figures)

arXiv:1412.7888 [pdf, ps, other]

Accurate Distributed Time Synchronization in Mobile Wireless Sensor Networks from Noisy Difference Measurements

Authors: Chenda Liao, Prabir Barooah

Abstract: We propose a distributed algorithm for time synchronization in mobile wireless sensor networks. Each node can employ the algorithm to estimate the global time based on its local clock time. The problem of time synchronization is formulated as nodes estimating their skews and offsets from noisy difference measurements of offsets and logarithm of skews; the measurements acquired by time-stamped mess… ▽ More We propose a distributed algorithm for time synchronization in mobile wireless sensor networks. Each node can employ the algorithm to estimate the global time based on its local clock time. The problem of time synchronization is formulated as nodes estimating their skews and offsets from noisy difference measurements of offsets and logarithm of skews; the measurements acquired by time-stamped message exchanges between neighbors. A distributed stochastic approximation based algorithm is proposed to ensure that the estimation error is mean square convergent (variance converging to 0) under certain conditions. A sequence of scheduled update instants is used to meet the requirement of decreasing time-varying gains that need to be synchronized across nodes with unsynchronized clocks. Moreover, a modification on the algorithm is also presented to improve the initial convergence speed. Simulations indicate that highly accurate global time estimates can be achieved with the proposed algorithm for long time durations, while the errors in competing algorithms increase over time. △ Less

Submitted 25 December, 2014; originally announced December 2014.

arXiv:1301.2218 [pdf, ps, other]

Estimation from Relative Measurements in Mobile Networks with Markovian Switching Topology: Clock Skew and Offset Estimation for Time Synchronization

Authors: Chenda Liao, Prabir Barooah

Abstract: We analyze a distributed algorithm for estimation of scalar parameters belonging to nodes in a mobile network from noisy relative measurements. The motivation comes from the problem of clock skew and offset estimation for the purpose of time synchronization. The time variation of the network was modeled as a Markov chain. The estimates are shown to be mean square convergent under fairly weak assum… ▽ More We analyze a distributed algorithm for estimation of scalar parameters belonging to nodes in a mobile network from noisy relative measurements. The motivation comes from the problem of clock skew and offset estimation for the purpose of time synchronization. The time variation of the network was modeled as a Markov chain. The estimates are shown to be mean square convergent under fairly weak assumptions on the Markov chain, as long as the union of the graphs is connected. Expressions for the asymptotic mean and correlation are also provided. The Markovian switching topology model of mobile networks is justified for certain node mobility models through empirically estimated conditional entropy measures. △ Less

Submitted 19 February, 2013; v1 submitted 10 January, 2013; originally announced January 2013.

arXiv:1102.3396 [pdf, ps, other]

Detecting Separation in Robotic and Sensor Networks

Authors: Chenda Liao, Harshavardhan Chenji, Prabir Barooah, Radu Stoleru, Tamás Kalmár-Nagy

Abstract: In this paper we consider the problem of monitoring detecting separation of agents from a base station in robotic and sensor networks. Such separation can be caused by mobility and/or failure of the agents. While separation/cut detection may be performed by passing messages between a node and the base in static networks, such a solution is impractical for networks with high mobility, since routes… ▽ More In this paper we consider the problem of monitoring detecting separation of agents from a base station in robotic and sensor networks. Such separation can be caused by mobility and/or failure of the agents. While separation/cut detection may be performed by passing messages between a node and the base in static networks, such a solution is impractical for networks with high mobility, since routes are constantly changing. We propose a distributed algorithm to detect separation from the base station. The algorithm consists of an averaging scheme in which every node updates a scalar state by communicating with its current neighbors. We prove that if a node is permanently disconnected from the base station, its state converges to $0$. If a node is connected to the base station in an average sense, even if not connected in any instant, then we show that the expected value of its state converges to a positive number. Therefore, a node can detect if it has been separated from the base station by monitoring its state. The effectiveness of the proposed algorithm is demonstrated through simulations, a real system implementation and experiments involving both static as well as mobile networks. △ Less

Submitted 16 February, 2011; originally announced February 2011.

Showing 1–43 of 43 results for author: Liao, C