Search | arXiv e-print repository

Generative Data Assimilation of Sparse Weather Station Observations at Kilometer Scales

Authors: Peter Manshausen, Yair Cohen, Jaideep Pathak, Mike Pritchard, Piyush Garg, Morteza Mardani, Karthik Kashinath, Simon Byrne, Noah Brenowitz

Abstract: Data assimilation of observational data into full atmospheric states is essential for weather forecast model initialization. Recently, methods for deep generative data assimilation have been proposed which allow for using new input data without retraining the model. They could also dramatically accelerate the costly data assimilation process used in operational regional weather models. Here, in a… ▽ More Data assimilation of observational data into full atmospheric states is essential for weather forecast model initialization. Recently, methods for deep generative data assimilation have been proposed which allow for using new input data without retraining the model. They could also dramatically accelerate the costly data assimilation process used in operational regional weather models. Here, in a central US testbed, we demonstrate the viability of score-based data assimilation in the context of realistically complex km-scale weather. We train an unconditional diffusion model to generate snapshots of a state-of-the-art km-scale analysis product, the High Resolution Rapid Refresh. Then, using score-based data assimilation to incorporate sparse weather station data, the model produces maps of precipitation and surface winds. The generated fields display physically plausible structures, such as gust fronts, and sensitivity tests confirm learnt physics through multivariate relationships. Preliminary skill analysis shows the approach already outperforms a naive baseline of the High-Resolution Rapid Refresh system itself. By incorporating observations from 40 weather stations, 10\% lower RMSEs on left-out stations are attained. Despite some lingering imperfections such as insufficiently disperse ensemble DA estimates, we find the results overall an encouraging proof of concept, and the first at km-scale. It is a ripe time to explore extensions that combine increasingly ambitious regional state generators with an increasing set of in situ, ground-based, and satellite remote sensing data streams. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 18 pages, 7 figures

ACM Class: J.2

arXiv:2406.16683 [pdf, other]

Repulsive Score Distillation for Diverse Sampling of Diffusion Models

Authors: Nicolas Zilberstein, Morteza Mardani, Santiago Segarra

Abstract: Score distillation sampling has been pivotal for integrating diffusion models into generation of complex visuals. Despite impressive results it suffers from mode collapse and lack of diversity. To cope with this challenge, we leverage the gradient flow interpretation of score distillation to propose Repulsive Score Distillation (RSD). In particular, we propose a variational framework based on repu… ▽ More Score distillation sampling has been pivotal for integrating diffusion models into generation of complex visuals. Despite impressive results it suffers from mode collapse and lack of diversity. To cope with this challenge, we leverage the gradient flow interpretation of score distillation to propose Repulsive Score Distillation (RSD). In particular, we propose a variational framework based on repulsion of an ensemble of particles that promotes diversity. Using a variational approximation that incorporates a coupling among particles, the repulsion appears as a simple regularization that allows interaction of particles based on their relative pairwise similarity, measured e.g., via radial basis kernels. We design RSD for both unconstrained and constrained sampling scenarios. For constrained sampling we focus on inverse problems in the latent space that leads to an augmented variational formulation, that strikes a good balance between compute, quality and diversity. Our extensive experiments for text-to-image generation, and inverse problems demonstrate that RSD achieves a superior trade-off between diversity and quality compared with state-of-the-art alternatives. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2405.08246 [pdf, other]

Compositional Text-to-Image Generation with Dense Blob Representations

Authors: Weili Nie, Sifei Liu, Morteza Mardani, Chao Liu, Benjamin Eckart, Arash Vahdat

Abstract: Existing text-to-image models struggle to follow complex text prompts, raising the need for extra grounding inputs for better controllability. In this work, we propose to decompose a scene into visual primitives - denoted as dense blob representations - that contain fine-grained details of the scene while being modular, human-interpretable, and easy-to-construct. Based on blob representations, we… ▽ More Existing text-to-image models struggle to follow complex text prompts, raising the need for extra grounding inputs for better controllability. In this work, we propose to decompose a scene into visual primitives - denoted as dense blob representations - that contain fine-grained details of the scene while being modular, human-interpretable, and easy-to-construct. Based on blob representations, we develop a blob-grounded text-to-image diffusion model, termed BlobGEN, for compositional generation. Particularly, we introduce a new masked cross-attention module to disentangle the fusion between blob representations and visual features. To leverage the compositionality of large language models (LLMs), we introduce a new in-context learning approach to generate blob representations from text prompts. Our extensive experiments show that BlobGEN achieves superior zero-shot generation quality and better layout-guided controllability on MS-COCO. When augmented by LLMs, our method exhibits superior numerical and spatial correctness on compositional image generation benchmarks. Project page: https://blobgen-2d.github.io. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2404.06517 [pdf, other]

DiffObs: Generative Diffusion for Global Forecasting of Satellite Observations

Authors: Jason Stock, Jaideep Pathak, Yair Cohen, Mike Pritchard, Piyush Garg, Dale Durran, Morteza Mardani, Noah Brenowitz

Abstract: This work presents an autoregressive generative diffusion model (DiffObs) to predict the global evolution of daily precipitation, trained on a satellite observational product, and assessed with domain-specific diagnostics. The model is trained to probabilistically forecast day-ahead precipitation. Nonetheless, it is stable for multi-month rollouts, which reveal a qualitatively realistic superposit… ▽ More This work presents an autoregressive generative diffusion model (DiffObs) to predict the global evolution of daily precipitation, trained on a satellite observational product, and assessed with domain-specific diagnostics. The model is trained to probabilistically forecast day-ahead precipitation. Nonetheless, it is stable for multi-month rollouts, which reveal a qualitatively realistic superposition of convectively coupled wave modes in the tropics. Cross-spectral analysis confirms successful generation of low frequency variations associated with the Madden--Julian oscillation, which regulates most subseasonal to seasonal predictability in the observed atmosphere, and convectively coupled moist Kelvin waves with approximately correct dispersion relationships. Despite secondary issues and biases, the results affirm the potential for a next generation of global diffusion models trained on increasingly sparse, and increasingly direct and differentiated observations of the world, for practical applications in subseasonal and climate prediction. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Published as a workshop paper at "Tackling Climate Change with Machine Learning", ICLR 2024

arXiv:2401.04099 [pdf, other]

AGG: Amortized Generative 3D Gaussians for Single Image to 3D

Authors: Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat

Abstract: Given the growing need for automatic 3D content creation pipelines, various 3D representations have been studied to generate 3D objects from a single image. Due to its superior rendering efficiency, 3D Gaussian splatting-based models have recently excelled in both 3D reconstruction and generation. 3D Gaussian splatting approaches for image to 3D generation are often optimization-based, requiring m… ▽ More Given the growing need for automatic 3D content creation pipelines, various 3D representations have been studied to generate 3D objects from a single image. Due to its superior rendering efficiency, 3D Gaussian splatting-based models have recently excelled in both 3D reconstruction and generation. 3D Gaussian splatting approaches for image to 3D generation are often optimization-based, requiring many computationally expensive score-distillation steps. To overcome these challenges, we introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image, eliminating the need for per-instance optimization. Utilizing an intermediate hybrid representation, AGG decomposes the generation of 3D Gaussian locations and other appearance attributes for joint optimization. Moreover, we propose a cascaded pipeline that first generates a coarse representation of the 3D data and later upsamples it with a 3D Gaussian super-resolution module. Our method is evaluated against existing optimization-based 3D Gaussian frameworks and sampling-based pipelines utilizing other 3D representations, where AGG showcases competitive generation abilities both qualitatively and quantitatively while being several orders of magnitude faster. Project page: https://ir1d.github.io/AGG/ △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: Project page: https://ir1d.github.io/AGG/

arXiv:2310.01799 [pdf, other]

SMRD: SURE-based Robust MRI Reconstruction with Diffusion Models

Authors: Batu Ozturkler, Chao Liu, Benjamin Eckart, Morteza Mardani, Jiaming Song, Jan Kautz

Abstract: Diffusion models have recently gained popularity for accelerated MRI reconstruction due to their high sample quality. They can effectively serve as rich data priors while incorporating the forward model flexibly at inference time, and they have been shown to be more robust than unrolled methods under distribution shifts. However, diffusion models require careful tuning of inference hyperparameters… ▽ More Diffusion models have recently gained popularity for accelerated MRI reconstruction due to their high sample quality. They can effectively serve as rich data priors while incorporating the forward model flexibly at inference time, and they have been shown to be more robust than unrolled methods under distribution shifts. However, diffusion models require careful tuning of inference hyperparameters on a validation set and are still sensitive to distribution shifts during testing. To address these challenges, we introduce SURE-based MRI Reconstruction with Diffusion models (SMRD), a method that performs test-time hyperparameter tuning to enhance robustness during testing. SMRD uses Stein's Unbiased Risk Estimator (SURE) to estimate the mean squared error of the reconstruction during testing. SURE is then used to automatically tune the inference hyperparameters and to set an early stop** criterion without the need for validation tuning. To the best of our knowledge, SMRD is the first to incorporate SURE into the sampling stage of diffusion models for automatic hyperparameter selection. SMRD outperforms diffusion model baselines on various measurement noise levels, acceleration factors, and anatomies, achieving a PSNR improvement of up to 6 dB under measurement noise. The code is publicly available at https://github.com/NVlabs/SMRD . △ Less

Submitted 18 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: MICCAI 2023

arXiv:2309.15214 [pdf, other]

Residual Diffusion Modeling for Km-scale Atmospheric Downscaling

Authors: Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Karthik Kashinath, Jan Kautz, Mike Pritchard

Abstract: Predictions of weather hazard require expensive km-scale simulations driven by coarser global inputs. Here, a cost-effective stochastic downscaling model is trained from a high-resolution 2-km weather model over Taiwan conditioned on 25-km ERA5 reanalysis. To address the multi-scale machine learning challenges of weather data, we employ a two-step approach Corrector Diffusion (\textit{CorrDiff}),… ▽ More Predictions of weather hazard require expensive km-scale simulations driven by coarser global inputs. Here, a cost-effective stochastic downscaling model is trained from a high-resolution 2-km weather model over Taiwan conditioned on 25-km ERA5 reanalysis. To address the multi-scale machine learning challenges of weather data, we employ a two-step approach Corrector Diffusion (\textit{CorrDiff}), where a UNet prediction of the mean is corrected by a diffusion step. Akin to Reynolds decomposition in fluid dynamics, this isolates generative learning to the stochastic scales. \textit{CorrDiff} exhibits skillful RMSE and CRPS and faithfully recovers spectra and distributions even for extremes. Case studies of coherent weather phenomena reveal appropriate multivariate relationships reminiscent of learnt physics: the collocation of intense rainfall and sharp gradients in fronts and extreme winds and rainfall bands near the eyewall of typhoons. Downscaling global forecasts successfully retains many of these benefits, foreshadowing the potential of end-to-end, global-to-km-scales machine learning weather predictions. △ Less

Submitted 9 December, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

arXiv:2305.04391 [pdf, other]

A Variational Perspective on Solving Inverse Problems with Diffusion Models

Authors: Morteza Mardani, Jiaming Song, Jan Kautz, Arash Vahdat

Abstract: Diffusion models have emerged as a key pillar of foundation models in visual domains. One of their critical applications is to universally solve different downstream inverse tasks via a single diffusion prior without re-training for each task. Most inverse tasks can be formulated as inferring a posterior distribution over data (e.g., a full image) given a measurement (e.g., a masked image). This i… ▽ More Diffusion models have emerged as a key pillar of foundation models in visual domains. One of their critical applications is to universally solve different downstream inverse tasks via a single diffusion prior without re-training for each task. Most inverse tasks can be formulated as inferring a posterior distribution over data (e.g., a full image) given a measurement (e.g., a masked image). This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable. To cope with this challenge, we propose a variational approach that by design seeks to approximate the true posterior distribution. We show that our approach naturally leads to regularization by denoising diffusion process (RED-Diff) where denoisers at different timesteps concurrently impose different structural constraints over the image. To gauge the contribution of denoisers from different timesteps, we propose a weighting mechanism based on signal-to-noise-ratio (SNR). Our approach provides a new variational perspective for solving inverse problems with diffusion models, allowing us to formulate sampling as stochastic optimization, where one can simply apply off-the-shelf solvers with lightweight iterates. Our experiments for image restoration tasks such as inpainting and superresolution demonstrate the strengths of our method compared with state-of-the-art sampling-based diffusion models. △ Less

Submitted 29 September, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

arXiv:2208.05419 [pdf, ps, other]

FourCastNet: Accelerating Global High-Resolution Weather Forecasting using Adaptive Fourier Neural Operators

Authors: Thorsten Kurth, Shashank Subramanian, Peter Harrington, Jaideep Pathak, Morteza Mardani, David Hall, Andrea Miele, Karthik Kashinath, Animashree Anandkumar

Abstract: Extreme weather amplified by climate change is causing increasingly devastating impacts across the globe. The current use of physics-based numerical weather prediction (NWP) limits accuracy due to high computational cost and strict time-to-solution limits. We report that a data-driven deep learning Earth system emulator, FourCastNet, can predict global weather and generate medium-range forecasts f… ▽ More Extreme weather amplified by climate change is causing increasingly devastating impacts across the globe. The current use of physics-based numerical weather prediction (NWP) limits accuracy due to high computational cost and strict time-to-solution limits. We report that a data-driven deep learning Earth system emulator, FourCastNet, can predict global weather and generate medium-range forecasts five orders-of-magnitude faster than NWP while approaching state-of-the-art accuracy. FourCast-Net is optimized and scales efficiently on three supercomputing systems: Selene, Perlmutter, and JUWELS Booster up to 3,808 NVIDIA A100 GPUs, attaining 140.8 petaFLOPS in mixed precision (11.9%of peak at that scale). The time-to-solution for training FourCastNet measured on JUWELS Booster on 3,072GPUs is 67.4minutes, resulting in an 80,000times faster time-to-solution relative to state-of-the-art NWP, in inference. FourCastNet produces accurate instantaneous weather predictions for a week in advance, enables enormous ensembles that better capture weather extremes, and supports higher global forecast resolutions. △ Less

Submitted 8 August, 2022; originally announced August 2022.

arXiv:2207.08393 [pdf, other]

GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction

Authors: Batu Ozturkler, Arda Sahiner, Tolga Ergen, Arjun D Desai, Christopher M Sandino, Shreyas Vasanawala, John M Pauly, Morteza Mardani, Mert Pilanci

Abstract: Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction. These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization. However, they require several iterations of a large neural network to handle high-dimensional imaging tasks such as 3D MRI. This limits traditional training… ▽ More Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction. These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization. However, they require several iterations of a large neural network to handle high-dimensional imaging tasks such as 3D MRI. This limits traditional training algorithms based on backpropagation due to prohibitively large memory and compute requirements for calculating gradients and storing intermediate activations. To address this challenge, we propose Greedy LEarning for Accelerated MRI (GLEAM) reconstruction, an efficient training strategy for high-dimensional imaging settings. GLEAM splits the end-to-end network into decoupled network modules. Each module is optimized in a greedy manner with decoupled gradient updates, reducing the memory footprint during training. We show that the decoupled gradient updates can be performed in parallel on multiple graphical processing units (GPUs) to further reduce training time. We present experiments with 2D and 3D datasets including multi-coil knee, brain, and dynamic cardiac cine MRI. We observe that: i) GLEAM generalizes as well as state-of-the-art memory-efficient baselines such as gradient checkpointing and invertible networks with the same memory footprint, but with 1.3x faster training; ii) for the same memory footprint, GLEAM yields 1.1dB PSNR gain in 2D and 1.8 dB in 3D over end-to-end baselines. △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2205.08078 [pdf, other]

Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

Authors: Arda Sahiner, Tolga Ergen, Batu Ozturkler, John Pauly, Morteza Mardani, Mert Pilanci

Abstract: Vision transformers using self-attention or its proposed alternatives have demonstrated promising results in many image related tasks. However, the underpinning inductive bias of attention is not well understood. To address this issue, this paper analyzes attention through the lens of convex duality. For the non-linear dot-product self-attention, and alternative mechanisms such as MLP-mixer and Fo… ▽ More Vision transformers using self-attention or its proposed alternatives have demonstrated promising results in many image related tasks. However, the underpinning inductive bias of attention is not well understood. To address this issue, this paper analyzes attention through the lens of convex duality. For the non-linear dot-product self-attention, and alternative mechanisms such as MLP-mixer and Fourier Neural Operator (FNO), we derive equivalent finite-dimensional convex problems that are interpretable and solvable to global optimality. The convex programs lead to {\it block nuclear-norm regularization} that promotes low rank in the latent feature and token dimensions. In particular, we show how self-attention networks implicitly clusters the tokens, based on their latent similarity. We conduct experiments for transferring a pre-trained transformer backbone for CIFAR-100 classification by fine-tuning a variety of convex attention heads. The results indicate the merits of the bias induced by attention compared with the existing MLP or linear heads. △ Less

Submitted 20 May, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

Comments: 38 pages, 2 figures. To appear in ICML 2022

arXiv:2202.11214 [pdf, other]

FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

Authors: Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, Pedram Hassanzadeh, Karthik Kashinath, Animashree Anandkumar

Abstract: FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at $0.25^{\circ}$ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It has important implications for planning win… ▽ More FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at $0.25^{\circ}$ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It has important implications for planning wind energy resources, predicting extreme weather events such as tropical cyclones, extra-tropical cyclones, and atmospheric rivers. FourCastNet matches the forecasting accuracy of the ECMWF Integrated Forecasting System (IFS), a state-of-the-art Numerical Weather Prediction (NWP) model, at short lead times for large-scale variables, while outperforming IFS for variables with complex fine-scale structure, including precipitation. FourCastNet generates a week-long forecast in less than 2 seconds, orders of magnitude faster than IFS. The speed of FourCastNet enables the creation of rapid and inexpensive large-ensemble forecasts with thousands of ensemble-members for improving probabilistic forecasting. We discuss how data-driven deep learning models such as FourCastNet are a valuable addition to the meteorology toolkit to aid and augment NWP models. △ Less

Submitted 22 February, 2022; originally announced February 2022.

arXiv:2111.13587 [pdf, other]

Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers

Authors: John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, Bryan Catanzaro

Abstract: Vision transformers have delivered tremendous success in representation learning. This is primarily due to effective token mixing through self attention. However, this scales quadratically with the number of pixels, which becomes infeasible for high-resolution inputs. To cope with this challenge, we propose Adaptive Fourier Neural Operator (AFNO) as an efficient token mixer that learns to mix in t… ▽ More Vision transformers have delivered tremendous success in representation learning. This is primarily due to effective token mixing through self attention. However, this scales quadratically with the number of pixels, which becomes infeasible for high-resolution inputs. To cope with this challenge, we propose Adaptive Fourier Neural Operator (AFNO) as an efficient token mixer that learns to mix in the Fourier domain. AFNO is based on a principled foundation of operator learning which allows us to frame token mixing as a continuous global convolution without any dependence on the input resolution. This principle was previously used to design FNO, which solves global convolution efficiently in the Fourier domain and has shown promise in learning challenging PDEs. To handle challenges in visual representation learning such as discontinuities in images and high resolution inputs, we propose principled architectural modifications to FNO which results in memory and computational efficiency. This includes imposing a block-diagonal structure on the channel mixing weights, adaptively sharing weights across tokens, and sparsifying the frequency modes via soft-thresholding and shrinkage. The resulting model is highly parallel with a quasi-linear complexity and has linear memory in the sequence size. AFNO outperforms self-attention mechanisms for few-shot segmentation in terms of both efficiency and accuracy. For Cityscapes segmentation with the Segformer-B3 backbone, AFNO can handle a sequence size of 65k and outperforms other efficient self-attention mechanisms. △ Less

Submitted 27 March, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

arXiv:2107.05680 [pdf, other]

Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

Authors: Arda Sahiner, Tolga Ergen, Batu Ozturkler, Burak Bartan, John Pauly, Morteza Mardani, Mert Pilanci

Abstract: Generative Adversarial Networks (GANs) are commonly used for modeling complex distributions of data. Both the generators and discriminators of GANs are often modeled by neural networks, posing a non-transparent optimization problem which is non-convex and non-concave over the generator and discriminator, respectively. Such networks are often heuristically optimized with gradient descent-ascent (GD… ▽ More Generative Adversarial Networks (GANs) are commonly used for modeling complex distributions of data. Both the generators and discriminators of GANs are often modeled by neural networks, posing a non-transparent optimization problem which is non-convex and non-concave over the generator and discriminator, respectively. Such networks are often heuristically optimized with gradient descent-ascent (GDA), but it is unclear whether the optimization problem contains any saddle points, or whether heuristic methods can find them in practice. In this work, we analyze the training of Wasserstein GANs with two-layer neural network discriminators through the lens of convex duality, and for a variety of generators expose the conditions under which Wasserstein GANs can be solved exactly with convex optimization approaches, or can be represented as convex-concave games. Using this convex duality interpretation, we further demonstrate the impact of different activation functions of the discriminator. Our observations are verified with numerical results demonstrating the power of the convex interpretation, with applications in progressive training of convex architectures corresponding to linear generators and quadratic-activation discriminators for CelebA image generation. The code for our experiments is available at https://github.com/ardasahiner/ProCoGAN. △ Less

Submitted 21 March, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

Comments: Published as paper in ICLR 2022. First two authors contributed equally to this work; 34 pages, 11 figures

arXiv:2103.01499 [pdf, other]

Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization

Authors: Tolga Ergen, Arda Sahiner, Batu Ozturkler, John Pauly, Morteza Mardani, Mert Pilanci

Abstract: Batch Normalization (BN) is a commonly used technique to accelerate and stabilize training of deep neural networks. Despite its empirical success, a full theoretical understanding of BN is yet to be developed. In this work, we analyze BN through the lens of convex optimization. We introduce an analytic framework based on convex duality to obtain exact convex representations of weight-decay regular… ▽ More Batch Normalization (BN) is a commonly used technique to accelerate and stabilize training of deep neural networks. Despite its empirical success, a full theoretical understanding of BN is yet to be developed. In this work, we analyze BN through the lens of convex optimization. We introduce an analytic framework based on convex duality to obtain exact convex representations of weight-decay regularized ReLU networks with BN, which can be trained in polynomial-time. Our analyses also show that optimal layer weights can be obtained as simple closed-form formulas in the high-dimensional and/or overparameterized regimes. Furthermore, we find that Gradient Descent provides an algorithmic bias effect on the standard non-convex BN network, and we design an approach to explicitly encode this implicit regularization into the convex objective. Experiments with CIFAR image classification highlight the effectiveness of this explicit regularization for mimicking and substantially improving the performance of standard BN networks. △ Less

Submitted 21 March, 2022; v1 submitted 2 March, 2021; originally announced March 2021.

Comments: Accepted to ICLR 2022. First two authors contributed equally to this work; 36 pages, 13 figures

arXiv:2012.05169 [pdf, other]

Convex Regularization Behind Neural Reconstruction

Authors: Arda Sahiner, Morteza Mardani, Batu Ozturkler, Mert Pilanci, John Pauly

Abstract: Neural networks have shown tremendous potential for reconstructing high-resolution images in inverse problems. The non-convex and opaque nature of neural networks, however, hinders their utility in sensitive applications such as medical imaging. To cope with this challenge, this paper advocates a convex duality framework that makes a two-layer fully-convolutional ReLU denoising network amenable to… ▽ More Neural networks have shown tremendous potential for reconstructing high-resolution images in inverse problems. The non-convex and opaque nature of neural networks, however, hinders their utility in sensitive applications such as medical imaging. To cope with this challenge, this paper advocates a convex duality framework that makes a two-layer fully-convolutional ReLU denoising network amenable to convex optimization. The convex dual network not only offers the optimum training with convex solvers, but also facilitates interpreting training and prediction. In particular, it implies training neural networks with weight decay regularization induces path sparsity while the prediction is piecewise linear filtering. A range of experiments with MNIST and fastMRI datasets confirm the efficacy of the dual network optimization problem. △ Less

Submitted 9 December, 2020; originally announced December 2020.

arXiv:2010.00003 [pdf, other]

Spectral Decomposition in Deep Networks for Segmentation of Dynamic Medical Images

Authors: Edgar A. Rios Piedra, Morteza Mardani, Frank Ong, Ukash Nakarmi, Joseph Y. Cheng, Shreyas Vasanawala

Abstract: Dynamic contrast-enhanced magnetic resonance imaging (DCE- MRI) is a widely used multi-phase technique routinely used in clinical practice. DCE and similar datasets of dynamic medical data tend to contain redundant information on the spatial and temporal components that may not be relevant for detection of the object of interest and result in unnecessarily complex computer models with long trainin… ▽ More Dynamic contrast-enhanced magnetic resonance imaging (DCE- MRI) is a widely used multi-phase technique routinely used in clinical practice. DCE and similar datasets of dynamic medical data tend to contain redundant information on the spatial and temporal components that may not be relevant for detection of the object of interest and result in unnecessarily complex computer models with long training times that may also under-perform at test time due to the abundance of noisy heterogeneous data. This work attempts to increase the training efficacy and performance of deep networks by determining redundant information in the spatial and spectral components and show that the performance of segmentation accuracy can be maintained and potentially improved. Reported experiments include the evaluation of training/testing efficacy on a heterogeneous dataset composed of abdominal images of pediatric DCE patients, showing that drastic data reduction (higher than 80%) can preserve the dynamic information and performance of the segmentation model, while effectively suppressing noise and unwanted portion of the images. △ Less

Submitted 29 September, 2020; originally announced October 2020.

arXiv:1910.07048 [pdf, other]

doi 10.1109/TMI.2020.3022968

Wasserstein GANs for MR Imaging: from Paired to Unpaired Training

Authors: Ke Lei, Morteza Mardani, John M. Pauly, Shreyas S. Vasanawala

Abstract: Lack of ground-truth MR images impedes the common supervised training of neural networks for image reconstruction. To cope with this challenge, this paper leverages unpaired adversarial training for reconstruction networks, where the inputs are undersampled k-space and naively reconstructed images from one dataset, and the labels are high-quality images from another dataset. The reconstruction net… ▽ More Lack of ground-truth MR images impedes the common supervised training of neural networks for image reconstruction. To cope with this challenge, this paper leverages unpaired adversarial training for reconstruction networks, where the inputs are undersampled k-space and naively reconstructed images from one dataset, and the labels are high-quality images from another dataset. The reconstruction networks consist of a generator which suppresses the input image artifacts, and a discriminator using a pool of (unpaired) labels to adjust the reconstruction quality. The generator is an unrolled neural network -- a cascade of convolutional and data consistency layers. The discriminator is also a multilayer CNN that plays the role of a critic scoring the quality of reconstructed images based on the Wasserstein distance. Our experiments with knee MRI datasets demonstrate that the proposed unpaired training enables diagnostic-quality reconstruction when high-quality image labels are not available for the input types of interest, or when the amount of labels is small. In addition, our adversarial training scheme can achieve better image quality (as rated by expert radiologists) compared with the paired training schemes with pixel-wise loss. △ Less

Submitted 7 September, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

arXiv:1906.03742 [pdf, other]

Degrees of Freedom Analysis of Unrolled Neural Networks

Authors: Morteza Mardani, Qingyun Sun, Vardan Papyan, Shreyas Vasanawala, John Pauly, David Donoho

Abstract: Unrolled neural networks emerged recently as an effective model for learning inverse maps appearing in image restoration tasks. However, their generalization risk (i.e., test mean-squared-error) and its link to network design and train sample size remains mysterious. Leveraging the Stein's Unbiased Risk Estimator (SURE), this paper analyzes the generalization risk with its bias and variance compon… ▽ More Unrolled neural networks emerged recently as an effective model for learning inverse maps appearing in image restoration tasks. However, their generalization risk (i.e., test mean-squared-error) and its link to network design and train sample size remains mysterious. Leveraging the Stein's Unbiased Risk Estimator (SURE), this paper analyzes the generalization risk with its bias and variance components for recurrent unrolled networks. We particularly investigate the degrees-of-freedom (DOF) component of SURE, trace of the end-to-end network Jacobian, to quantify the prediction variance. We prove that DOF is well-approximated by the weighted \textit{path sparsity} of the network under incoherence conditions on the trained weights. Empirically, we examine the SURE components as a function of train sample size for both recurrent and non-recurrent (with many more parameters) unrolled networks. Our key observations indicate that: 1) DOF increases with train sample size and converges to the generalization risk for both recurrent and non-recurrent schemes; 2) recurrent network converges significantly faster (with less train samples) compared with non-recurrent scheme, hence recurrence serves as a regularization for low sample size regimes. △ Less

Submitted 9 June, 2019; originally announced June 2019.

arXiv:1903.07824 [pdf, other]

Compressed Sensing: From Research to Clinical Practice with Data-Driven Learning

Authors: Joseph Y. Cheng, Feiyu Chen, Christopher Sandino, Morteza Mardani, John M. Pauly, Shreyas S. Vasanawala

Abstract: Compressed sensing in MRI enables high subsampling factors while maintaining diagnostic image quality. This technique enables shortened scan durations and/or improved image resolution. Further, compressed sensing can increase the diagnostic information and value from each scan performed. Overall, compressed sensing has significant clinical impact in improving the diagnostic quality and patient exp… ▽ More Compressed sensing in MRI enables high subsampling factors while maintaining diagnostic image quality. This technique enables shortened scan durations and/or improved image resolution. Further, compressed sensing can increase the diagnostic information and value from each scan performed. Overall, compressed sensing has significant clinical impact in improving the diagnostic quality and patient experience for imaging exams. However, a number of challenges exist when moving compressed sensing from research to the clinic. These challenges include hand-crafted image priors, sensitive tuning parameters, and long reconstruction times. Data-driven learning provides a solution to address these challenges. As a result, compressed sensing can have greater clinical impact. In this tutorial, we will review the compressed sensing formulation and outline steps needed to transform this formulation to a deep learning framework. Supplementary open source code in python will be used to demonstrate this approach with open databases. Further, we will discuss considerations in applying data-driven compressed sensing in the clinical setting. △ Less

Submitted 19 March, 2019; originally announced March 2019.

Comments: Submitted to the Special Issue on Computational MRI: Compressed Sensing and Beyond in the IEEE Signal Processing Magazine

arXiv:1901.11228 [pdf, other]

Uncertainty Quantification in Deep MRI Reconstruction

Authors: Vineet Edupuganti, Morteza Mardani, Shreyas Vasanawala, John Pauly

Abstract: Reliable MRI is crucial for accurate interpretation in therapeutic and diagnostic tasks. However, undersampling during MRI acquisition as well as the overparameterized and non-transparent nature of deep learning (DL) leaves substantial uncertainty about the accuracy of DL reconstruction. With this in mind, this study aims to quantify the uncertainty in image recovery with DL models. To this end, w… ▽ More Reliable MRI is crucial for accurate interpretation in therapeutic and diagnostic tasks. However, undersampling during MRI acquisition as well as the overparameterized and non-transparent nature of deep learning (DL) leaves substantial uncertainty about the accuracy of DL reconstruction. With this in mind, this study aims to quantify the uncertainty in image recovery with DL models. To this end, we first leverage variational autoencoders (VAEs) to develop a probabilistic reconstruction scheme that maps out (low-quality) short scans with aliasing artifacts to the diagnostic-quality ones. The VAE encodes the acquisition uncertainty in a latent code and naturally offers a posterior of the image from which one can generate pixel variance maps using Monte-Carlo sampling. Accurately predicting risk requires knowledge of the bias as well, for which we leverage Stein's Unbiased Risk Estimator (SURE) as a proxy for mean-squared-error (MSE). Extensive empirical experiments are performed for Knee MRI reconstruction under different training losses (adversarial and pixel-wise) and unrolled recurrent network architectures. Our key observations indicate that: 1) adversarial losses introduce more uncertainty; and 2) recurrent unrolled nets reduce the prediction uncertainty and risk. △ Less

Submitted 25 April, 2020; v1 submitted 31 January, 2019; originally announced January 2019.

arXiv:1806.03963 [pdf, other]

Neural Proximal Gradient Descent for Compressive Imaging

Authors: Morteza Mardani, Qingyun Sun, Shreyas Vasawanala, Vardan Papyan, Hatef Monajemi, John Pauly, David Donoho

Abstract: Recovering high-resolution images from limited sensory data typically leads to a serious ill-posed inverse problem, demanding inversion algorithms that effectively capture the prior information. Learning a good inverse map** from training data faces severe challenges, including: (i) scarcity of training data; (ii) need for plausible reconstructions that are physically feasible; (iii) need for fa… ▽ More Recovering high-resolution images from limited sensory data typically leads to a serious ill-posed inverse problem, demanding inversion algorithms that effectively capture the prior information. Learning a good inverse map** from training data faces severe challenges, including: (i) scarcity of training data; (ii) need for plausible reconstructions that are physically feasible; (iii) need for fast reconstruction, especially in real-time applications. We develop a successful system solving all these challenges, using as basic architecture the recurrent application of proximal gradient algorithm. We learn a proximal map that works well with real images based on residual networks. Contraction of the resulting map is analyzed, and incoherence conditions are investigated that drive the convergence of the iterates. Extensive experiments are carried out under different settings: (a) reconstructing abdominal MRI of pediatric patients from highly undersampled Fourier-space data and (b) superresolving natural face images. Our key findings include: 1. a recurrent ResNet with a single residual block unrolled from an iterative algorithm yields an effective proximal which accurately reveals MR image details. 2. Our architecture significantly outperforms conventional non-recurrent deep ResNets by 2dB SNR; it is also trained much more rapidly. 3. It outperforms state-of-the-art compressed-sensing Wavelet-based methods by 4dB SNR, with 100x speedups in reconstruction time. △ Less

Submitted 1 June, 2018; originally announced June 2018.

Comments: arXiv admin note: text overlap with arXiv:1711.10046

arXiv:1711.10046 [pdf, other]

Recurrent Generative Adversarial Networks for Proximal Learning and Automated Compressive Image Recovery

Authors: Morteza Mardani, Hatef Monajemi, Vardan Papyan, Shreyas Vasanawala, David Donoho, John Pauly

Abstract: Recovering images from undersampled linear measurements typically leads to an ill-posed linear inverse problem, that asks for proper statistical priors. Building effective priors is however challenged by the low train and test overhead dictated by real-time tasks; and the need for retrieving visually "plausible" and physically "feasible" images with minimal hallucination. To cope with these challe… ▽ More Recovering images from undersampled linear measurements typically leads to an ill-posed linear inverse problem, that asks for proper statistical priors. Building effective priors is however challenged by the low train and test overhead dictated by real-time tasks; and the need for retrieving visually "plausible" and physically "feasible" images with minimal hallucination. To cope with these challenges, we design a cascaded network architecture that unrolls the proximal gradient iterations by permeating benefits from generative residual networks (ResNet) to modeling the proximal operator. A mixture of pixel-wise and perceptual costs is then deployed to train proximals. The overall architecture resembles back-and-forth projection onto the intersection of feasible and plausible images. Extensive computational experiments are examined for a global task of reconstructing MR images of pediatric patients, and a more local task of superresolving CelebA faces, that are insightful to design efficient architectures. Our observations indicate that for MRI reconstruction, a recurrent ResNet with a single residual block effectively learns the proximal. This simple architecture appears to significantly outperform the alternative deep ResNet architecture by 2dB SNR, and the conventional compressed-sensing MRI by 4dB SNR with 100x faster inference. For image superresolution, our preliminary results indicate that modeling the denoising proximal demands deep ResNets. △ Less

Submitted 27 November, 2017; originally announced November 2017.

Comments: 11 pages, 11 figures

arXiv:1706.00051 [pdf, other]

Deep Generative Adversarial Networks for Compressed Sensing Automates MRI

Authors: Morteza Mardani, Enhao Gong, Joseph Y. Cheng, Shreyas Vasanawala, Greg Zaharchuk, Marcus Alley, Neil Thakur, Song Han, William Dally, John M. Pauly, Lei Xing

Abstract: Magnetic resonance image (MRI) reconstruction is a severely ill-posed linear inverse task demanding time and resource intensive computations that can substantially trade off {\it accuracy} for {\it speed} in real-time imaging. In addition, state-of-the-art compressed sensing (CS) analytics are not cognizant of the image {\it diagnostic quality}. To cope with these challenges we put forth a novel C… ▽ More Magnetic resonance image (MRI) reconstruction is a severely ill-posed linear inverse task demanding time and resource intensive computations that can substantially trade off {\it accuracy} for {\it speed} in real-time imaging. In addition, state-of-the-art compressed sensing (CS) analytics are not cognizant of the image {\it diagnostic quality}. To cope with these challenges we put forth a novel CS framework that permeates benefits from generative adversarial networks (GAN) to train a (low-dimensional) manifold of diagnostic-quality MR images from historical patients. Leveraging a mixture of least-squares (LS) GANs and pixel-wise $\ell_1$ cost, a deep residual network with skip connections is trained as the generator that learns to remove the {\it aliasing} artifacts by projecting onto the manifold. LSGAN learns the texture details, while $\ell_1$ controls the high-frequency noise. A multilayer convolutional neural network is then jointly trained based on diagnostic quality images to discriminate the projection quality. The test phase performs feed-forward propagation over the generator network that demands a very low computational overhead. Extensive evaluations are performed on a large contrast-enhanced MR dataset of pediatric patients. In particular, images rated based on expert radiologists corroborate that GANCS retrieves high contrast images with detailed texture relative to conventional CS, and pixel-wise schemes. In addition, it offers reconstruction under a few milliseconds, two orders of magnitude faster than state-of-the-art CS-MRI schemes. △ Less

Submitted 31 May, 2017; originally announced June 2017.

arXiv:1609.04104 [pdf, ps, other]

Tracking Tensor Subspaces with Informative Random Sampling for Real-Time MR Imaging

Authors: Morteza Mardani, Georgios B. Giannakis, Kamil Ugurbil

Abstract: Magnetic resonance imaging (MRI) nowadays serves as an important modality for diagnostic and therapeutic guidance in clinics. However, the {\it slow acquisition} process, the dynamic deformation of organs, as well as the need for {\it real-time} reconstruction, pose major challenges toward obtaining artifact-free images. To cope with these challenges, the present paper advocates a novel subspace l… ▽ More Magnetic resonance imaging (MRI) nowadays serves as an important modality for diagnostic and therapeutic guidance in clinics. However, the {\it slow acquisition} process, the dynamic deformation of organs, as well as the need for {\it real-time} reconstruction, pose major challenges toward obtaining artifact-free images. To cope with these challenges, the present paper advocates a novel subspace learning framework that permeates benefits from parallel factor (PARAFAC) decomposition of tensors (multiway data) to low-rank modeling of temporal sequence of images. Treating images as multiway data arrays, the novel method preserves spatial structures and unravels the latent correlations across various dimensions by means of the tensor subspace. Leveraging the spatio-temporal correlation of images, Tykhonov regularization is adopted as a rank surrogate for a least-squares optimization program. Alteranating majorization minimization is adopted to develop online algorithms that recursively procure the reconstruction upon arrival of a new undersampled $k$-space frame. The developed algorithms are {\it provably convergent} and highly {\it parallelizable} with lightweight FFT tasks per iteration. To further accelerate the acquisition process, randomized subsampling policies are devised that leverage intermediate estimates of the tensor subspace, offered by the online scheme, to {\it randomly} acquire {\it informative} $k$-space samples. In a nutshell, the novel approach enables tracking motion dynamics under low acquisition rates `on the fly.' GPU-based tests with real {\it in vivo} MRI datasets of cardiac cine images corroborate the merits of the novel approach relative to state-of-the-art alternatives. △ Less

Submitted 13 September, 2016; originally announced September 2016.

arXiv:1407.1660 [pdf, ps, other]

Estimating Traffic and Anomaly Maps via Network Tomography

Authors: Morteza Mardani, Georgios B. Giannakis

Abstract: Map** origin-destination (OD) network traffic is pivotal for network management and proactive security tasks. However, lack of sufficient flow-level measurements as well as potential anomalies pose major challenges towards this goal. Leveraging the spatiotemporal correlation of nominal traffic, and the sparse nature of anomalies, this paper brings forth a novel framework to map out nominal and a… ▽ More Map** origin-destination (OD) network traffic is pivotal for network management and proactive security tasks. However, lack of sufficient flow-level measurements as well as potential anomalies pose major challenges towards this goal. Leveraging the spatiotemporal correlation of nominal traffic, and the sparse nature of anomalies, this paper brings forth a novel framework to map out nominal and anomalous traffic, which treats jointly important network monitoring tasks including traffic estimation, anomaly detection, and traffic interpolation. To this end, a convex program is first formulated with nuclear and $\ell_1$-norm regularization to effect sparsity and low rank for the nominal and anomalous traffic with only the link counts and a {\it small} subset of OD-flow counts. Analysis and simulations confirm that the proposed estimator can {\em exactly} recover sufficiently low-dimensional nominal traffic and sporadic anomalies so long as the routing paths are sufficiently "spread-out" across the network, and an adequate amount of flow counts are randomly sampled. The results offer valuable insights about data acquisition strategies and network scenaria giving rise to accurate traffic estimation. For practical networks where the aforementioned conditions are possibly violated, the inherent spatiotemporal traffic patterns are taken into account by adopting a Bayesian approach along with a bilinear characterization of the nuclear and $\ell_1$ norms. The resultant nonconvex program involves quadratic regularizers with correlation matrices, learned systematically from (cyclo)stationary historical data. Alternating-minimization based algorithms with provable convergence are also developed to procure the estimates. Insightful tests with synthetic and real Internet data corroborate the effectiveness of the novel schemes. △ Less

Submitted 7 July, 2014; originally announced July 2014.

Comments: 16 pages, 9 Figures, submitted to IEEE/ACM Transactions on Networking

arXiv:1404.4667 [pdf, ps, other]

doi 10.1109/TSP.2015.2417491

Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors

Authors: Morteza Mardani, Gonzalo Mateos, Georgios B. Giannakis

Abstract: Extracting latent low-dimensional structure from high-dimensional data is of paramount importance in timely inference tasks encountered with `Big Data' analytics. However, increasingly noisy, heterogeneous, and incomplete datasets as well as the need for {\em real-time} processing of streaming data pose major challenges to this end. In this context, the present paper permeates benefits from rank m… ▽ More Extracting latent low-dimensional structure from high-dimensional data is of paramount importance in timely inference tasks encountered with `Big Data' analytics. However, increasingly noisy, heterogeneous, and incomplete datasets as well as the need for {\em real-time} processing of streaming data pose major challenges to this end. In this context, the present paper permeates benefits from rank minimization to scalable imputation of missing data, via tracking low-dimensional subspaces and unraveling latent (possibly multi-way) structure from \emph{incomplete streaming} data. For low-rank matrix data, a subspace estimator is proposed based on an exponentially-weighted least-squares criterion regularized with the nuclear norm. After recasting the non-separable nuclear norm into a form amenable to online optimization, real-time algorithms with complementary strengths are developed and their convergence is established under simplifying technical assumptions. In a stationary setting, the asymptotic estimates obtained offer the well-documented performance guarantees of the {\em batch} nuclear-norm regularized estimator. Under the same unifying framework, a novel online (adaptive) algorithm is developed to obtain multi-way decompositions of \emph{low-rank tensors} with missing entries, and perform imputation as a byproduct. Simulated tests with both synthetic as well as real Internet and cardiac magnetic resonance imagery (MRI) data confirm the efficacy of the proposed algorithms, and their superior performance relative to state-of-the-art alternatives. △ Less

Submitted 17 April, 2014; originally announced April 2014.

arXiv:1208.4043 [pdf, ps, other]

doi 10.1109/JSTSP.2012.2233193

Dynamic Anomalography: Tracking Network Anomalies via Sparsity and Low Rank

Authors: Morteza Mardani, Gonzalo Mateos, Georgios B. Giannakis

Abstract: In the backbone of large-scale networks, origin-to-destination (OD) traffic flows experience abrupt unusual changes known as traffic volume anomalies, which can result in congestion and limit the extent to which end-user quality of service requirements are met. As a means of maintaining seamless end-user experience in dynamic environments, as well as for ensuring network security, this paper deals… ▽ More In the backbone of large-scale networks, origin-to-destination (OD) traffic flows experience abrupt unusual changes known as traffic volume anomalies, which can result in congestion and limit the extent to which end-user quality of service requirements are met. As a means of maintaining seamless end-user experience in dynamic environments, as well as for ensuring network security, this paper deals with a crucial network monitoring task termed dynamic anomalography. Given link traffic measurements (noisy superpositions of unobserved OD flows) periodically acquired by backbone routers, the goal is to construct an estimated map of anomalies in real time, and thus summarize the network `health state' along both the flow and time dimensions. Leveraging the low intrinsic-dimensionality of OD flows and the sparse nature of anomalies, a novel online estimator is proposed based on an exponentially-weighted least-squares criterion regularized with the sparsity-promoting $\ell_1$-norm of the anomalies, and the nuclear norm of the nominal traffic matrix. After recasting the non-separable nuclear norm into a form amenable to online optimization, a real-time algorithm for dynamic anomalography is developed and its convergence established under simplifying technical assumptions. For operational conditions where computational complexity reductions are at a premium, a lightweight stochastic gradient algorithm based on Nesterov's acceleration technique is developed as well. Comprehensive numerical tests with both synthetic and real network data corroborate the effectiveness of the proposed online algorithms and their tracking capabilities, and demonstrate that they outperform state-of-the-art approaches developed to diagnose traffic anomalies. △ Less

Submitted 20 August, 2012; originally announced August 2012.

Comments: 33 pages, 7 figures, submitted to the IEEE Journal of Selected Topics in Signal Processing - Special issue on `Anomalous pattern discovery for spatial, temporal, networked, and high-dimensional signals'

arXiv:1204.6537 [pdf, ps, other]

doi 10.1109/TIT.2013.2257913

Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies

Authors: Morteza Mardani, Gonzalo Mateos, Georgios B. Giannakis

Abstract: Given the superposition of a low-rank matrix plus the product of a known fat compression matrix times a sparse matrix, the goal of this paper is to establish deterministic conditions under which exact recovery of the low-rank and sparse components becomes possible. This fundamental identifiability issue arises with traffic anomaly detection in backbone networks, and subsumes compressed sensing as… ▽ More Given the superposition of a low-rank matrix plus the product of a known fat compression matrix times a sparse matrix, the goal of this paper is to establish deterministic conditions under which exact recovery of the low-rank and sparse components becomes possible. This fundamental identifiability issue arises with traffic anomaly detection in backbone networks, and subsumes compressed sensing as well as the timely low-rank plus sparse matrix recovery tasks encountered in matrix decomposition problems. Leveraging the ability of $\ell_1$- and nuclear norms to recover sparse and low-rank matrices, a convex program is formulated to estimate the unknowns. Analysis and simulations confirm that the said convex program can recover the unknowns for sufficiently low-rank and sparse enough components, along with a compression matrix possessing an isometry property when restricted to operate on sparse vectors. When the low-rank, sparse, and compression matrices are drawn from certain random ensembles, it is established that exact recovery is possible with high probability. First-order algorithms are developed to solve the nonsmooth convex optimization problem with provable iteration complexity guarantees. Insightful tests with synthetic and real network data corroborate the effectiveness of the novel approach in unveiling traffic anomalies across flows and time, and its ability to outperform existing alternatives. △ Less

Submitted 29 April, 2012; originally announced April 2012.

Comments: 38 pages, submitted to the IEEE Transactions on Information Theory

arXiv:1203.1570 [pdf, ps, other]

doi 10.1109/TSP.2013.2279080

In-network Sparsity-regularized Rank Minimization: Algorithms and Applications

Authors: Morteza Mardani, Gonzalo Mateos, Georgios B. Giannakis

Abstract: Given a limited number of entries from the superposition of a low-rank matrix plus the product of a known fat compression matrix times a sparse matrix, recovery of the low-rank and sparse components is a fundamental task subsuming compressed sensing, matrix completion, and principal components pursuit. This paper develops algorithms for distributed sparsity-regularized rank minimization over netwo… ▽ More Given a limited number of entries from the superposition of a low-rank matrix plus the product of a known fat compression matrix times a sparse matrix, recovery of the low-rank and sparse components is a fundamental task subsuming compressed sensing, matrix completion, and principal components pursuit. This paper develops algorithms for distributed sparsity-regularized rank minimization over networks, when the nuclear- and $\ell_1$-norm are used as surrogates to the rank and nonzero entry counts of the sought matrices, respectively. While nuclear-norm minimization has well-documented merits when centralized processing is viable, non-separability of the singular-value sum challenges its distributed minimization. To overcome this limitation, an alternative characterization of the nuclear norm is adopted which leads to a separable, yet non-convex cost minimized via the alternating-direction method of multipliers. The novel distributed iterations entail reduced-complexity per-node tasks, and affordable message passing among single-hop neighbors. Interestingly, upon convergence the distributed (non-convex) estimator provably attains the global optimum of its centralized counterpart, regardless of initialization. Several application domains are outlined to highlight the generality and impact of the proposed framework. These include unveiling traffic anomalies in backbone networks, predicting networkwide path latencies, and map** the RF ambiance using wireless cognitive radios. Simulations with synthetic and real network data corroborate the convergence of the novel distributed algorithm, and its centralized performance guarantees. △ Less

Submitted 7 March, 2012; originally announced March 2012.

Comments: 30 pages, submitted for publication on the IEEE Trans. Signal Process

arXiv:0811.4403 [pdf]

Joint Adaptive Modulation Coding and Cooperative ARQ over Relay Channels-Applications to Land Mobile Satellite Communications

Authors: Morteza Mardani, Jalil S. Harsini, Farshad Lahouti, Behrouz Eliasi

Abstract: In a cooperative relay network, a relay node (R) facilitates data transmission to the destination node (D), when the latter is unable to decode the source node (S) data correctly. This paper considers such a system model and presents a cross-layer approach to jointly design adaptive modulation and coding (AMC) at the physical layer and cooperative truncated automatic repeat request (ARQ) protoco… ▽ More In a cooperative relay network, a relay node (R) facilitates data transmission to the destination node (D), when the latter is unable to decode the source node (S) data correctly. This paper considers such a system model and presents a cross-layer approach to jointly design adaptive modulation and coding (AMC) at the physical layer and cooperative truncated automatic repeat request (ARQ) protocol at the data link layer. We first derive a closed form expression for the spectral efficiency of the joint cooperative ARQ-AMC scheme. Aiming at maximizing this performance measure, we then optimize two AMC schemes for S-D and R-D links, which directly satisfy a prescribed packet loss rate constraint. As an interesting application, we also consider the problem of joint link adaptation and blockage mitigation in land mobile satellite communications (LMSC). We also present a new relay-assisted transmission protocol for LMSC, which delivers the source data to the destination via the relaying link, when the S-D channel is in outage. Numerical results indicate that the proposed schemes noticeably enhances the spectral efficiency compared to a system, which uses a conventional ARQ-AMC scheme at the S-D link, or a system which employs an optimized fixed rate cooperative-ARQ protocol. △ Less

Submitted 26 November, 2008; originally announced November 2008.

Comments: 24 pages, 7 figures, 1 table, Submitted to the International Journal on Wireless Communications and Mobile Computing, Wiley & Sons, Submitted July 2008

arXiv:0811.4397 [pdf]

doi 10.1109/ISWCS.2008.4726069

Joint Adaptive Modulation-Coding and Cooperative ARQ for Wireless Relay Networks

Authors: Morteza Mardani, Jalil S. Harsini, Farshad Lahouti, Behrouz Eliasi

Abstract: This paper presents a cross-layer approach to jointly design adaptive modulation and coding (AMC) at the physical layer and cooperative truncated automatic repeat request (ARQ) protocol at the data link layer. We first derive an exact closed form expression for the spectral efficiency of the proposed joint AMC-cooperative ARQ scheme. Aiming at maximizing this system performance measure, we then… ▽ More This paper presents a cross-layer approach to jointly design adaptive modulation and coding (AMC) at the physical layer and cooperative truncated automatic repeat request (ARQ) protocol at the data link layer. We first derive an exact closed form expression for the spectral efficiency of the proposed joint AMC-cooperative ARQ scheme. Aiming at maximizing this system performance measure, we then optimize an AMC scheme which directly satisfies a prescribed packet loss rate constraint at the data-link layer. The results indicate that utilizing cooperative ARQ as a retransmission strategy, noticeably enhances the spectral efficiency compared with the system that employs AMC alone at the physical layer. Moreover, the proposed adaptive rate cooperative ARQ scheme outperforms the fixed rate counterpart when the transmission modes at the source and relay are chosen based on the channel statistics. This in turn quantifies the possible gain achieved by joint design of AMC and ARQ in wireless relay networks. △ Less

Submitted 26 November, 2008; originally announced November 2008.

Comments: 5 pages, 4 figures, To appear in the Proceedings of the 2008 IEEE International Symposium on Wireless Communication Systems (ISWCS), Rykevick, Island, Oct 2008

arXiv:0811.4391 [pdf]

Cross-Layer Link Adaptation Design for Relay Channels with Cooperative ARQ Protocol

Authors: Morteza Mardani, Jalil S. Harsini, Farshad Lahouti

Abstract: The cooperative automatic repeat request (C-ARQ) is a link layer relaying protocol which exploits the spatial diversity and allows the relay node to retransmit the source data packet to the destination, when the latter is unable to decode the source data correctly. This paper presents a cross-layer link adaptation design for C-ARQ based relay channels in which both source and relay nodes employ… ▽ More The cooperative automatic repeat request (C-ARQ) is a link layer relaying protocol which exploits the spatial diversity and allows the relay node to retransmit the source data packet to the destination, when the latter is unable to decode the source data correctly. This paper presents a cross-layer link adaptation design for C-ARQ based relay channels in which both source and relay nodes employ adaptive modulation coding and power adaptation at the physical layer. For this scenario, we first derive closed-form expressions for the system spectral efficiency and average power consumption. We then present a low complexity iterative algorithm to find the optimized adaptation solution by maximizing the spectral efficiency subject to a packet loss rate (PLR) and an average power consumption constraint. The results indicate that the proposed adaptation scheme enhances the spectral efficiency noticeably when compared to other adaptive schemes, while guaranteeing the required PLR performance. △ Less

Submitted 17 February, 2009; v1 submitted 26 November, 2008; originally announced November 2008.

Comments: 6 pages, 3 figures, Submitted

Showing 1–33 of 33 results for author: Mardani, M