-
Quantitative perfusion maps using a novelty spatiotemporal convolutional neural network
Authors:
Anbo Cao,
Pin-Yu Le,
Zhonghui Qie,
Haseeb Hassan,
Yingwei Guo,
Asim Zaman,
Jiaxi Lu,
Xueqiang Zeng,
Huihui Yang,
Xiaoqiang Miao,
Taiyu Han,
Guangtao Huang,
Yan Kang,
Yu Luo,
Jia Guo
Abstract:
Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning t…
▽ More
Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning technology could leverage it, which can accurately estimate clinical perfusion parameters compared to traditional clinical approaches. Therefore, this study presents a perfusion parameters estimation network that considers spatial and temporal information, the Spatiotemporal Network (ST-Net), for the first time. The proposed network comprises a designed physical loss function to enhance model performance further. The results indicate that the network can accurately estimate perfusion parameters, including cerebral blood volume (CBV), cerebral blood flow (CBF), and time to maximum of the residual function (Tmax). The structural similarity index (SSIM) mean values for CBV, CBF, and Tmax parameters were 0.952, 0.943, and 0.863, respectively. The DICE score for the hypo-perfused region reached 0.859, demonstrating high consistency. The proposed model also maintains time efficiency, closely approaching the performance of commercial gold-standard software.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Robust Autoencoders for Collective Corruption Removal
Authors:
Taihui Li,
Hengkang Wang,
Peng Le,
XianE Tang,
Ju Sun
Abstract:
Robust PCA is a standard tool for learning a linear subspace in the presence of sparse corruption or rare outliers. What about robustly learning manifolds that are more realistic models for natural data, such as images? There have been several recent attempts to generalize robust PCA to manifold settings. In this paper, we propose $\ell_1$- and scaling-invariant $\ell_1/\ell_2$-robust autoencoders…
▽ More
Robust PCA is a standard tool for learning a linear subspace in the presence of sparse corruption or rare outliers. What about robustly learning manifolds that are more realistic models for natural data, such as images? There have been several recent attempts to generalize robust PCA to manifold settings. In this paper, we propose $\ell_1$- and scaling-invariant $\ell_1/\ell_2$-robust autoencoders based on a surprisingly compact formulation built on the intuition that deep autoencoders perform manifold learning. We demonstrate on several standard image datasets that the proposed formulation significantly outperforms all previous methods in collectively removing sparse corruption, without clean images for training. Moreover, we also show that the learned manifold structures can be generalized to unseen data samples effectively.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
Pre-training for Speech Translation: CTC Meets Optimal Transport
Authors:
Phuong-Hang Le,
Hongyu Gong,
Changhan Wang,
Juan Pino,
Benjamin Lecouteux,
Didier Schwab
Abstract:
The gap between speech and text modalities is a major challenge in speech-to-text translation (ST). Different methods have been proposed to reduce this gap, but most of them require architectural changes in ST training. In this work, we propose to mitigate this issue at the pre-training stage, requiring no change in the ST model. First, we show that the connectionist temporal classification (CTC)…
▽ More
The gap between speech and text modalities is a major challenge in speech-to-text translation (ST). Different methods have been proposed to reduce this gap, but most of them require architectural changes in ST training. In this work, we propose to mitigate this issue at the pre-training stage, requiring no change in the ST model. First, we show that the connectionist temporal classification (CTC) loss can reduce the modality gap by design. We provide a quantitative comparison with the more common cross-entropy loss, showing that pre-training with CTC consistently achieves better final ST accuracy. Nevertheless, CTC is only a partial solution and thus, in our second contribution, we propose a novel pre-training method combining CTC and optimal transport to further reduce this gap. Our method pre-trains a Siamese-like model composed of two encoders, one for acoustic inputs and the other for textual inputs, such that they produce representations that are close to each other in the Wasserstein space. Extensive experiments on the standard CoVoST-2 and MuST-C datasets show that our pre-training method applied to the vanilla encoder-decoder Transformer achieves state-of-the-art performance under the no-external-data setting, and performs on par with recent strong multi-task learning systems trained with external data. Finally, our method can also be applied on top of these multi-task systems, leading to further improvements for these models. Code and pre-trained models are available at https://github.com/formiel/fairseq.
△ Less
Submitted 5 June, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
Single-Image HDR Reconstruction by Multi-Exposure Generation
Authors:
Phuoc-Hieu Le,
Quynh Le,
Rang Nguyen,
Binh-Son Hua
Abstract:
High dynamic range (HDR) imaging is an indispensable technique in modern photography. Traditional methods focus on HDR reconstruction from multiple images, solving the core problems of image alignment, fusion, and tone map**, yet having a perfect solution due to ghosting and other visual artifacts in the reconstruction. Recent attempts at single-image HDR reconstruction show a promising alternat…
▽ More
High dynamic range (HDR) imaging is an indispensable technique in modern photography. Traditional methods focus on HDR reconstruction from multiple images, solving the core problems of image alignment, fusion, and tone map**, yet having a perfect solution due to ghosting and other visual artifacts in the reconstruction. Recent attempts at single-image HDR reconstruction show a promising alternative: by learning to map pixel values to their irradiance using a neural network, one can bypass the align-and-merge pipeline completely yet still obtain a high-quality HDR image. In this work, we propose a weakly supervised learning method that inverts the physical image formation process for HDR reconstruction via learning to generate multiple exposures from a single image. Our neural network can invert the camera response to reconstruct pixel irradiance before synthesizing multiple exposures and hallucinating details in under- and over-exposed regions from a single input image. To train the network, we propose a representation loss, a reconstruction loss, and a perceptual loss applied on pairs of under- and over-exposure images and thus do not require HDR images for training. Our experiments show that our proposed model can effectively reconstruct HDR images. Our qualitative and quantitative results show that our method achieves state-of-the-art performance on the DrTMO dataset. Our code is available at https://github.com/VinAIResearch/single_image_hdr.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Joint UAV Placement and IRS Phase Shift Optimization in Downlink Networks
Authors:
Hung Nguyen-Kha,
Hieu V. Nguyen,
Mai T. P. Le,
Oh-Soon Shin
Abstract:
This study investigates the integration of an intelligent reflecting surface (IRS) into an unmanned aerial vehicle (UAV) platform to utilize the advantages of these leading technologies for sixth-generation communications, e.g., improved spectral and energy efficiency, extended network coverage, and flexible deployment. In particular, we investigate a downlink IRS-UAV system, wherein single-antenn…
▽ More
This study investigates the integration of an intelligent reflecting surface (IRS) into an unmanned aerial vehicle (UAV) platform to utilize the advantages of these leading technologies for sixth-generation communications, e.g., improved spectral and energy efficiency, extended network coverage, and flexible deployment. In particular, we investigate a downlink IRS-UAV system, wherein single-antenna ground users (UEs) are served by a multi-antenna base station (BS). To assist the communication between UEs and the BS, an IRS mounted on a UAV is deployed, in which the direct links are obstructed owing to the complex urban channel characteristics. The beamforming at the BS, phase shift at the IRS, and the 3D placement of the UAV are jointly optimized to maximize the sum rate. Because the optimization variables, particularly the beamforming and IRS phase shift, are highly coupled with each other, the optimization problem is naturally non-convex. To effectively solve the formulated problem, we propose an iterative algorithm that employs block coordinate descent and inner approximation methods. Numerical results demonstrate the effectiveness of our proposed approach for a UAV-mounted IRS system on the sum rate performance over the state-of-the-art technology using the terrestrial counterpart.
△ Less
Submitted 2 February, 2023; v1 submitted 14 July, 2022;
originally announced July 2022.
-
Fisher Task Distance and Its Application in Neural Architecture Search
Authors:
Cat P. Le,
Mohammadreza Soltani,
Juncheng Dong,
Vahid Tarokh
Abstract:
We formulate an asymmetric (or non-commutative) distance between tasks based on Fisher Information Matrices, called Fisher task distance. This distance represents the complexity of transferring the knowledge from one task to another. We provide a proof of consistency for our distance through theorems and experiments on various classification tasks from MNIST, CIFAR-10, CIFAR-100, ImageNet, and Tas…
▽ More
We formulate an asymmetric (or non-commutative) distance between tasks based on Fisher Information Matrices, called Fisher task distance. This distance represents the complexity of transferring the knowledge from one task to another. We provide a proof of consistency for our distance through theorems and experiments on various classification tasks from MNIST, CIFAR-10, CIFAR-100, ImageNet, and Taskonomy datasets. Next, we construct an online neural architecture search framework using the Fisher task distance, in which we have access to the past learned tasks. By using the Fisher task distance, we can identify the closest learned tasks to the target task, and utilize the knowledge learned from these related tasks for the target task. Here, we show how the proposed distance between a target task and a set of learned tasks can be used to reduce the neural architecture search space for the target task. The complexity reduction in search space for task-specific architectures is achieved by building on the optimized architectures for similar tasks instead of doing a full search and without using this side information. Experimental results for tasks in MNIST, CIFAR-10, CIFAR-100, ImageNet datasets demonstrate the efficacy of the proposed approach and its improvements, in terms of the performance and the number of parameters, over other gradient-based search methods, such as ENAS, DARTS, PC-DARTS.
△ Less
Submitted 30 April, 2022; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Code-domain NOMA in Massive MIMO: When is it Needed?
Authors:
Mai T. P. Le,
Luca Sanguinetti,
Emil Björnson,
Maria-Gabriella Di Benedetto
Abstract:
In overloaded Massive MIMO (mMIMO) systems, wherein the number $K$ of user equipments (UEs) exceeds the number of base station antennas $M$, it has recently been shown that non-orthogonal multiple access (NOMA) can increase the sum spectral efficiency. This paper aims at identifying cases where code-domain NOMA can improve the spectral efficiency of mMIMO in the classical regime where $K < M$. Nov…
▽ More
In overloaded Massive MIMO (mMIMO) systems, wherein the number $K$ of user equipments (UEs) exceeds the number of base station antennas $M$, it has recently been shown that non-orthogonal multiple access (NOMA) can increase the sum spectral efficiency. This paper aims at identifying cases where code-domain NOMA can improve the spectral efficiency of mMIMO in the classical regime where $K < M$. Novel spectral efficiency expressions are provided for the uplink and downlink with arbitrary spreading signatures and spatial correlation matrices. Particular attention is devoted to the planar arrays that are currently being deployed in pre-5G and 5G networks (in sub$-6$ GHz bands), which are characterized by limited spatial resolution. Numerical results show that mMIMO with such planar arrays can benefit from NOMA in scenarios where the UEs are spatially close to each other. A two-step UE grou** scheme is proposed for NOMA-aided mMIMO systems that is applicable to the spatial correlation matrices of the UEs that are currently active in each cell. Numerical results are used to investigate the performance of the algorithm under different operating conditions and types of spreading signatures (orthogonal, sparse and random sets). The analysis reveals that orthogonal signatures provide the highest average spectral efficiency.
△ Less
Submitted 3 April, 2021; v1 submitted 2 March, 2020;
originally announced March 2020.
-
What is the Benefit of Code-domain NOMA in Massive MIMO?
Authors:
Mai T. P. Le,
Luca Sanguinetti,
Emil Björnson,
Maria-Gabriella Di Benedetto
Abstract:
In overloaded Massive MIMO systems, wherein the number K of user equipments (UEs) exceeds the number of base station antennas M, it has recently been shown that non-orthogonal multiple access (NOMA) can increase performance. This paper aims at identifying cases of the classical operating regime K < M, where code-domain NOMA can also improve the spectral efficiency of Massive MIMO. Particular atten…
▽ More
In overloaded Massive MIMO systems, wherein the number K of user equipments (UEs) exceeds the number of base station antennas M, it has recently been shown that non-orthogonal multiple access (NOMA) can increase performance. This paper aims at identifying cases of the classical operating regime K < M, where code-domain NOMA can also improve the spectral efficiency of Massive MIMO. Particular attention is given to use cases in which poor favorable propagation conditions are experienced. Numerical results show that Massive MIMO with planar antenna arrays can benefit from NOMA in practical scenarios where the UEs are spatially close to each other.
△ Less
Submitted 3 July, 2019;
originally announced July 2019.