-
X-ray2CTPA: Generating 3D CTPA scans from 2D X-ray conditioning
Authors:
Noa Cahan,
Eyal Klang,
Galit Aviram,
Yiftach Barash,
Eli Konen,
Raja Giryes,
Hayit Greenspan
Abstract:
Chest X-rays or chest radiography (CXR), commonly used for medical diagnostics, typically enables limited imaging compared to computed tomography (CT) scans, which offer more detailed and accurate three-dimensional data, particularly contrast-enhanced scans like CT Pulmonary Angiography (CTPA). However, CT scans entail higher costs, greater radiation exposure, and are less accessible than CXRs. In…
▽ More
Chest X-rays or chest radiography (CXR), commonly used for medical diagnostics, typically enables limited imaging compared to computed tomography (CT) scans, which offer more detailed and accurate three-dimensional data, particularly contrast-enhanced scans like CT Pulmonary Angiography (CTPA). However, CT scans entail higher costs, greater radiation exposure, and are less accessible than CXRs. In this work we explore cross-modal translation from a 2D low contrast-resolution X-ray input to a 3D high contrast and spatial-resolution CTPA scan. Driven by recent advances in generative AI, we introduce a novel diffusion-based approach to this task. We evaluate the models performance using both quantitative metrics and qualitative feedback from radiologists, ensuring diagnostic relevance of the generated images. Furthermore, we employ the synthesized 3D images in a classification framework and show improved AUC in a PE categorization task, using the initial CXR input. The proposed method is generalizable and capable of performing additional cross-modality translations in medical imaging. It may pave the way for more accessible and cost-effective advanced diagnostic tools. The code for this project is available: https://github.com/NoaCahan/X-ray2CTPA .
△ Less
Submitted 25 June, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
Deep Phase Coded Image Prior
Authors:
Nimrod Shabtay,
Eli Schwartz,
Raja Giryes
Abstract:
Phase-coded imaging is a computational imaging method designed to tackle tasks such as passive depth estimation and extended depth of field (EDOF) using depth cues inserted during image capture. Most of the current deep learning-based methods for depth estimation or all-in-focus imaging require a training dataset with high-quality depth maps and an optimal focus point at infinity for all-in-focus…
▽ More
Phase-coded imaging is a computational imaging method designed to tackle tasks such as passive depth estimation and extended depth of field (EDOF) using depth cues inserted during image capture. Most of the current deep learning-based methods for depth estimation or all-in-focus imaging require a training dataset with high-quality depth maps and an optimal focus point at infinity for all-in-focus images. Such datasets are difficult to create, usually synthetic, and require external graphic programs. We propose a new method named "Deep Phase Coded Image Prior" (DPCIP) for jointly recovering the depth map and all-in-focus image from a coded-phase image using solely the captured image and the optical information of the imaging system. Our approach does not depend on any specific dataset and surpasses prior supervised techniques utilizing the same imaging system. This improvement is achieved through the utilization of a problem formulation based on implicit neural representation (INR) and deep image prior (DIP). Due to our zero-shot method, we overcome the barrier of acquiring accurate ground-truth data of depth maps and all-in-focus images for each new phase-coded system introduced. This allows focusing mainly on develo** the imaging system, and not on ground-truth data collection.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
Authors:
Noam Razin,
Yotam Alexander,
Edo Cohen-Karlik,
Raja Giryes,
Amir Globerson,
Nadav Cohen
Abstract:
In modern machine learning, models can often fit training data in numerous ways, some of which perform well on unseen (test) data, while others do not. Remarkably, in such cases gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This implicit bias was extensively studied in supervised learning, but is far less understood in optimal control (re…
▽ More
In modern machine learning, models can often fit training data in numerous ways, some of which perform well on unseen (test) data, while others do not. Remarkably, in such cases gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This implicit bias was extensively studied in supervised learning, but is far less understood in optimal control (reinforcement learning). There, learning a controller applied to a system via gradient descent is known as policy gradient, and a question of prime importance is the extent to which a learned controller extrapolates to unseen initial states. This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states. Focusing on the fundamental Linear Quadratic Regulator (LQR) problem, we establish that the extent of extrapolation depends on the degree of exploration induced by the system when commencing from initial states included in training. Experiments corroborate our theory, and demonstrate its conclusions on problems beyond LQR, where systems are non-linear and controllers are neural networks. We hypothesize that real-world optimal control may be greatly improved by develo** methods for informed selection of initial states to train on.
△ Less
Submitted 1 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
A Self Supervised StyleGAN for Image Annotation and Classification with Extremely Limited Labels
Authors:
Dana Cohen Hochberg,
Hayit Greenspan,
Raja Giryes
Abstract:
The recent success of learning-based algorithms can be greatly attributed to the immense amount of annotated data used for training. Yet, many datasets lack annotations due to the high costs associated with labeling, resulting in degraded performances of deep learning methods. Self-supervised learning is frequently adopted to mitigate the reliance on massive labeled datasets since it exploits unla…
▽ More
The recent success of learning-based algorithms can be greatly attributed to the immense amount of annotated data used for training. Yet, many datasets lack annotations due to the high costs associated with labeling, resulting in degraded performances of deep learning methods. Self-supervised learning is frequently adopted to mitigate the reliance on massive labeled datasets since it exploits unlabeled data to learn relevant feature representations. In this work, we propose SS-StyleGAN, a self-supervised approach for image annotation and classification suitable for extremely small annotated datasets. This novel framework adds self-supervision to the StyleGAN architecture by integrating an encoder that learns the embedding to the StyleGAN latent space, which is well-known for its disentangled properties. The learned latent space enables the smart selection of representatives from the data to be labeled for improved classification performance. We show that the proposed method attains strong classification results using small labeled datasets of sizes 50 and even 10. We demonstrate the superiority of our approach for the tasks of COVID-19 and liver tumor pathology identification.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Tell Me What You See: Text-Guided Real-World Image Denoising
Authors:
Erez Yosef,
Raja Giryes
Abstract:
Image reconstruction from noisy sensor measurements is a challenging problem. Many solutions have been proposed for it, where the main approach is learning good natural images prior along with modeling the true statistics of the noise in the scene. In the presence of very low lighting conditions, such approaches are usually not enough, and additional information is required, e.g., in the form of u…
▽ More
Image reconstruction from noisy sensor measurements is a challenging problem. Many solutions have been proposed for it, where the main approach is learning good natural images prior along with modeling the true statistics of the noise in the scene. In the presence of very low lighting conditions, such approaches are usually not enough, and additional information is required, e.g., in the form of using multiple captures. We suggest as an alternative to add a description of the scene as prior, which can be easily done by the photographer capturing the scene. Inspired by the remarkable success of diffusion models for image generation, using a text-guided diffusion model we show that adding image caption information significantly improves image denoising and reconstruction on both synthetic and real-world images.
△ Less
Submitted 29 May, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.
-
Deep Internal Learning: Deep Learning from a Single Input
Authors:
Tom Tirer,
Raja Giryes,
Se Young Chun,
Yonina C. Eldar
Abstract:
Deep learning, in general, focuses on training a neural network from large labeled datasets. Yet, in many cases there is value in training a network just from the input at hand. This is particularly relevant in many signal and image processing problems where training data is scarce and diversity is large on the one hand, and on the other, there is a lot of structure in the data that can be exploit…
▽ More
Deep learning, in general, focuses on training a neural network from large labeled datasets. Yet, in many cases there is value in training a network just from the input at hand. This is particularly relevant in many signal and image processing problems where training data is scarce and diversity is large on the one hand, and on the other, there is a lot of structure in the data that can be exploited. Using this information is the key to deep internal-learning strategies, which may involve training a network from scratch using a single input or adapting an already trained network to a provided input example at inference time. This survey paper aims at covering deep internal-learning techniques that have been proposed in the past few years for these two important directions. While our main focus will be on image processing problems, most of the approaches that we survey are derived for general signals (vectors with recurring patterns that can be distinguished from noise) and are therefore applicable to other modalities.
△ Less
Submitted 8 April, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
UDPM: Upsampling Diffusion Probabilistic Models
Authors:
Shady Abu-Hussein,
Raja Giryes
Abstract:
Denoising Diffusion Probabilistic Models (DDPM) have recently gained significant attention. DDPMs compose a Markovian process that begins in the data domain and gradually adds noise until reaching pure white noise. DDPMs generate high-quality samples from complex data distributions by defining an inverse process and training a deep neural network to learn this map**. However, these models are in…
▽ More
Denoising Diffusion Probabilistic Models (DDPM) have recently gained significant attention. DDPMs compose a Markovian process that begins in the data domain and gradually adds noise until reaching pure white noise. DDPMs generate high-quality samples from complex data distributions by defining an inverse process and training a deep neural network to learn this map**. However, these models are inefficient because they require many diffusion steps to produce aesthetically pleasing samples. Additionally, unlike generative adversarial networks (GANs), the latent space of diffusion models is less interpretable. In this work, we propose to generalize the denoising diffusion process into an Upsampling Diffusion Probabilistic Model (UDPM). In the forward process, we reduce the latent variable dimension through downsampling, followed by the traditional noise perturbation. As a result, the reverse process gradually denoises and upsamples the latent variable to produce a sample from the data distribution. We formalize the Markovian diffusion processes of UDPM and demonstrate its generation capabilities on the popular FFHQ, AFHQv2, and CIFAR10 datasets. UDPM generates images with as few as three network evaluations, whose overall computational cost is less than a single DDPM or EDM step, while achieving an FID score of 6.86. This surpasses current state-of-the-art efficient diffusion models that use a single denoising step for sampling. Additionally, UDPM offers an interpretable and interpolable latent space, which gives it an advantage over traditional DDPMs. Our code is available online: \url{https://github.com/shadyabh/UDPM/}
△ Less
Submitted 27 May, 2024; v1 submitted 25 May, 2023;
originally announced May 2023.
-
3D Masked Autoencoders with Application to Anomaly Detection in Non-Contrast Enhanced Breast MRI
Authors:
Daniel M. Lang,
Eli Schwartz,
Cosmin I. Bercea,
Raja Giryes,
Julia A. Schnabel
Abstract:
Self-supervised models allow (pre-)training on unlabeled data and therefore have the potential to overcome the need for large annotated cohorts. One leading self-supervised model is the masked autoencoder (MAE) which was developed on natural imaging data. The MAE is masking out a high fraction of visual transformer (ViT) input patches, to then recover the uncorrupted images as a pretraining task.…
▽ More
Self-supervised models allow (pre-)training on unlabeled data and therefore have the potential to overcome the need for large annotated cohorts. One leading self-supervised model is the masked autoencoder (MAE) which was developed on natural imaging data. The MAE is masking out a high fraction of visual transformer (ViT) input patches, to then recover the uncorrupted images as a pretraining task. In this work, we extend MAE to perform anomaly detection on breast magnetic resonance imaging (MRI). This new model, coined masked autoencoder for medical imaging (MAEMI) is trained on two non-contrast enhanced MRI sequences, aiming at lesion detection without the need for intravenous injection of contrast media and temporal image acquisition. During training, only non-cancerous images are presented to the model, with the purpose of localizing anomalous tumor regions during test time. We use a public dataset for model development. Performance of the architecture is evaluated in reference to subtraction images created from dynamic contrast enhanced (DCE)-MRI.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
ADIR: Adaptive Diffusion for Image Reconstruction
Authors:
Shady Abu-Hussein,
Tom Tirer,
Raja Giryes
Abstract:
In recent years, denoising diffusion models have demonstrated outstanding image generation performance. The information on natural images captured by these models is useful for many image reconstruction applications, where the task is to restore a clean image from its degraded observations. In this work, we propose a conditional sampling scheme that exploits the prior learned by diffusion models w…
▽ More
In recent years, denoising diffusion models have demonstrated outstanding image generation performance. The information on natural images captured by these models is useful for many image reconstruction applications, where the task is to restore a clean image from its degraded observations. In this work, we propose a conditional sampling scheme that exploits the prior learned by diffusion models while retaining agreement with the observations. We then combine it with a novel approach for adapting pretrained diffusion denoising networks to their input. We examine two adaption strategies: the first uses only the degraded image, while the second, which we advocate, is performed using images that are ``nearest neighbors'' of the degraded image, retrieved from a diverse dataset using an off-the-shelf visual-language model. To evaluate our method, we test it on two state-of-the-art publicly available diffusion models, Stable Diffusion and Guided Diffusion. We show that our proposed `adaptive diffusion for image reconstruction' (ADIR) approach achieves a significant improvement in the super-resolution, deblurring, and text-based editing tasks.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Denoiser-based projections for 2-D super-resolution multi-reference alignment
Authors:
Jonathan Shani,
Tom Tirer,
Raja Giryes,
Tamir Bendory
Abstract:
We study the 2-D super-resolution multi-reference alignment (SR-MRA) problem: estimating an image from its down-sampled, circularly-translated, and noisy copies. The SR-MRA problem serves as a mathematical abstraction of the structure determination problem for biological molecules. Since the SR-MRA problem is ill-posed without prior knowledge, accurate image estimation relies on designing priors t…
▽ More
We study the 2-D super-resolution multi-reference alignment (SR-MRA) problem: estimating an image from its down-sampled, circularly-translated, and noisy copies. The SR-MRA problem serves as a mathematical abstraction of the structure determination problem for biological molecules. Since the SR-MRA problem is ill-posed without prior knowledge, accurate image estimation relies on designing priors that well-describe the statistics of the images of interest. In this work, we build on recent advances in image processing, and harness the power of denoisers as priors of images. In particular, we suggest to use denoisers as projections, and design two computational frameworks to estimate the image: projected expectation-maximization and projected method of moments. We provide an efficient GPU implementation, and demonstrate the effectiveness of these algorithms by extensive numerical experiments on a wide range of parameters and images.
△ Less
Submitted 2 May, 2024; v1 submitted 10 April, 2022;
originally announced April 2022.
-
Shallow Transits -- Deep Learning II: Identify Individual Exoplanetary Transits in Red Noise using Deep Learning
Authors:
Elad Dvash,
Yam Peleg,
Shay Zucker,
Raja Giryes
Abstract:
In a previous paper, we have introduced a deep learning neural network that should be able to detect the existence of very shallow periodic planetary transits in the presence of red noise. The network in that feasibility study would not provide any further details about the detected transits. The current paper completes this missing part. We present a neural network that tags samples that were obt…
▽ More
In a previous paper, we have introduced a deep learning neural network that should be able to detect the existence of very shallow periodic planetary transits in the presence of red noise. The network in that feasibility study would not provide any further details about the detected transits. The current paper completes this missing part. We present a neural network that tags samples that were obtained during transits. This is essentially similar to the task of identifying the semantic context of each pixel in an image -- an important task in computer vision, called `semantic segmentation', which is often performed by deep neural networks. The neural network we present makes use of novel deep learning concepts such as U-Nets, Generative Adversarial Networks (GAN), and adversarial loss. The resulting segmentation should allow further studies of the light curves which are tagged as containing transits. This approach towards the detection and study of very shallow transits is bound to play a significant role in future space-based transit surveys such as PLATO, which are specifically aimed to detect those extremely difficult cases of long-period shallow transits. Our segmentation network also adds to the growing toolbox of deep learning approaches which are being increasingly used in the study of exoplanets, but so far mainly for vetting transits, rather than their initial detection.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Video Reconstruction from a Single Motion Blurred Image using Learned Dynamic Phase Coding
Authors:
Erez Yosef,
Shay Elmalem,
Raja Giryes
Abstract:
Video reconstruction from a single motion-blurred image is a challenging problem, which can enhance the capabilities of existing cameras. Recently, several works addressed this task using conventional imaging and deep learning. Yet, such purely-digital methods are inherently limited, due to direction ambiguity and noise sensitivity. Some works proposed to address these limitations using non-conven…
▽ More
Video reconstruction from a single motion-blurred image is a challenging problem, which can enhance the capabilities of existing cameras. Recently, several works addressed this task using conventional imaging and deep learning. Yet, such purely-digital methods are inherently limited, due to direction ambiguity and noise sensitivity. Some works proposed to address these limitations using non-conventional image sensors, however, such sensors are extremely rare and expensive. To circumvent these limitations with simpler means, we propose a hybrid optical-digital method for video reconstruction that requires only simple modifications to existing optical systems. We use a learned dynamic phase-coding in the lens aperture during the image acquisition to encode the motion trajectories, which serve as prior information for the video reconstruction process. The proposed computational camera generates a sharp frame burst of the scene at various frame rates from a single coded motion-blurred image, using an image-to-video convolutional neural network. We present advantages and improved performance compared to existing methods, using both simulations and a real-world camera prototype. We extend our optical coding also to video frame interpolation and present robust and improved results for noisy videos.
△ Less
Submitted 18 December, 2022; v1 submitted 27 December, 2021;
originally announced December 2021.
-
MISS GAN: A Multi-IlluStrator Style Generative Adversarial Network for image to illustration translation
Authors:
Noa Barzilay,
Tal Berkovitz Shalev,
Raja Giryes
Abstract:
Unsupervised style transfer that supports diverse input styles using only one trained generator is a challenging and interesting task in computer vision. This paper proposes a Multi-IlluStrator Style Generative Adversarial Network (MISS GAN) that is a multi-style framework for unsupervised image-to-illustration translation, which can generate styled yet content preserving images. The illustrations…
▽ More
Unsupervised style transfer that supports diverse input styles using only one trained generator is a challenging and interesting task in computer vision. This paper proposes a Multi-IlluStrator Style Generative Adversarial Network (MISS GAN) that is a multi-style framework for unsupervised image-to-illustration translation, which can generate styled yet content preserving images. The illustrations dataset is a challenging one since it is comprised of illustrations of seven different illustrators, hence contains diverse styles. Existing methods require to train several generators (as the number of illustrators) to handle the different illustrators' styles, which limits their practical usage, or require to train an image specific network, which ignores the style information provided in other images of the illustrator. MISS GAN is both input image specific and uses the information of other images using only one trained model.
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
Separable Joint Blind Deconvolution and Demixing
Authors:
Dana Weitzner,
Raja Giryes
Abstract:
Blind deconvolution and demixing is the problem of reconstructing convolved signals and kernels from the sum of their convolutions. This problem arises in many applications, such as blind MIMO. This work presents a separable approach to blind deconvolution and demixing via convex optimization. Unlike previous works, our formulation allows separation into smaller optimization problems, which signif…
▽ More
Blind deconvolution and demixing is the problem of reconstructing convolved signals and kernels from the sum of their convolutions. This problem arises in many applications, such as blind MIMO. This work presents a separable approach to blind deconvolution and demixing via convex optimization. Unlike previous works, our formulation allows separation into smaller optimization problems, which significantly improves complexity. We develop recovery guarantees, which comply with those of the original non-separable problem, and demonstrate the method performance under several normalization constraints.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
Image Restoration by Deep Projected GSURE
Authors:
Shady Abu-Hussein,
Tom Tirer,
Se Young Chun,
Yonina C. Eldar,
Raja Giryes
Abstract:
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution. In recent years, solutions that are based on deep Convolutional Neural Networks (CNNs) have shown great promise. Yet, most of these techniques, which train CNNs using external data, are restricted to the observation models that have been used in the training phase. A recent alternative…
▽ More
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution. In recent years, solutions that are based on deep Convolutional Neural Networks (CNNs) have shown great promise. Yet, most of these techniques, which train CNNs using external data, are restricted to the observation models that have been used in the training phase. A recent alternative that does not have this drawback relies on learning the target image using internal learning. One such prominent example is the Deep Image Prior (DIP) technique that trains a network directly on the input image with a least-squares loss. In this paper, we propose a new image restoration framework that is based on minimizing a loss function that includes a "projected-version" of the Generalized SteinUnbiased Risk Estimator (GSURE) and parameterization of the latent image by a CNN. We demonstrate two ways to use our framework. In the first one, where no explicit prior is used, we show that the proposed approach outperforms other internal learning methods, such as DIP. In the second one, we show that our GSURE-based loss leads to improved performance when used within a plug-and-play priors scheme.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
An Interpretation of Regularization by Denoising and its Application with the Back-Projected Fidelity Term
Authors:
Einav Yogev-Ofer,
Tom Tirer,
Raja Giryes
Abstract:
The vast majority of image recovery tasks are ill-posed problems. As such, methods that are based on optimization use cost functions that consist of both fidelity and prior (regularization) terms. A recent line of works imposes the prior by the Regularization by Denoising (RED) approach, which exploits the good performance of existing image denoising engines. Yet, the relation of RED to explicit p…
▽ More
The vast majority of image recovery tasks are ill-posed problems. As such, methods that are based on optimization use cost functions that consist of both fidelity and prior (regularization) terms. A recent line of works imposes the prior by the Regularization by Denoising (RED) approach, which exploits the good performance of existing image denoising engines. Yet, the relation of RED to explicit prior terms is still not well understood, as previous work requires too strong assumptions on the denoisers. In this paper, we make two contributions. First, we show that the RED gradient can be seen as a (sub)gradient of a prior function--but taken at a denoised version of the point. As RED is typically applied with a relatively small noise level, this interpretation indicates a similarity between RED and traditional gradients. This leads to our second contribution: We propose to combine RED with the Back-Projection (BP) fidelity term rather than the common Least Squares (LS) term that is used in previous works. We show that the advantages of BP over LS for image deblurring and super-resolution, which have been demonstrated for traditional gradients, carry on to the RED approach.
△ Less
Submitted 27 January, 2021;
originally announced January 2021.
-
GMM-Based Generative Adversarial Encoder Learning
Authors:
Yuri Feigin,
Hedva Spitzer,
Raja Giryes
Abstract:
While GAN is a powerful model for generating images, its inability to infer a latent space directly limits its use in applications requiring an encoder. Our paper presents a simple architectural setup that combines the generative capabilities of GAN with an encoder. We accomplish this by combining the encoder with the discriminator using shared weights, then training them simultaneously using a ne…
▽ More
While GAN is a powerful model for generating images, its inability to infer a latent space directly limits its use in applications requiring an encoder. Our paper presents a simple architectural setup that combines the generative capabilities of GAN with an encoder. We accomplish this by combining the encoder with the discriminator using shared weights, then training them simultaneously using a new loss term. We model the output of the encoder latent space via a GMM, which leads to both good clustering using this latent space and improved image generation by the GAN. Our framework is generic and can be easily plugged into any GAN strategy. In particular, we demonstrate it both with Vanilla GAN and Wasserstein GAN, where in both it leads to an improvement in the generated images in terms of both the IS and FID scores. Moreover, we show that our encoder learns a meaningful representation as its clustering results are competitive with the current GAN-based state-of-the-art in clustering.
△ Less
Submitted 8 December, 2020;
originally announced December 2020.
-
Deep Sparse Light Field Refocusing
Authors:
Shachar Ben Dayan,
David Mendlovic,
Raja Giryes
Abstract:
Light field photography enables to record 4D images, containing angular information alongside spatial information of the scene. One of the important applications of light field imaging is post-capture refocusing. Current methods require for this purpose a dense field of angle views; those can be acquired with a micro-lens system or with a compressive system. Both techniques have major drawbacks to…
▽ More
Light field photography enables to record 4D images, containing angular information alongside spatial information of the scene. One of the important applications of light field imaging is post-capture refocusing. Current methods require for this purpose a dense field of angle views; those can be acquired with a micro-lens system or with a compressive system. Both techniques have major drawbacks to consider, including bulky structures and angular-spatial resolution trade-off. We present a novel implementation of digital refocusing based on sparse angular information using neural networks. This allows recording high spatial resolution in favor of the angular resolution, thus, enabling to design compact and simple devices with improved hardware as well as better performance of compressive systems. We use a novel convolutional neural network whose relatively small structure enables fast reconstruction with low memory consumption. Moreover, it allows handling without re-training various refocusing ranges and noise levels. Results show major improvement compared to existing methods.
△ Less
Submitted 5 September, 2020;
originally announced September 2020.
-
Face Authentication from Grayscale Coded Light Field
Authors:
Dana Weitzner,
David Mendlovic,
Raja Giryes
Abstract:
Face verification is a fast-growing authentication tool for everyday systems, such as smartphones. While current 2D face recognition methods are very accurate, it has been suggested recently that one may wish to add a 3D sensor to such solutions to make them more reliable and robust to spoofing, e.g., using a 2D print of a person's face. Yet, this requires an additional relatively expensive depth…
▽ More
Face verification is a fast-growing authentication tool for everyday systems, such as smartphones. While current 2D face recognition methods are very accurate, it has been suggested recently that one may wish to add a 3D sensor to such solutions to make them more reliable and robust to spoofing, e.g., using a 2D print of a person's face. Yet, this requires an additional relatively expensive depth sensor. To mitigate this, we propose a novel authentication system, based on slim grayscale coded light field imaging. We provide a reconstruction free fast anti-spoofing mechanism, working directly on the coded image. It is followed by a multi-view, multi-modal face verification network that given grayscale data together with a low-res depth map achieves competitive results to the RGB case. We demonstrate the effectiveness of our solution on a simulated 3D (RGBD) version of LFW, which will be made public, and a set of real faces acquired by a light field computational camera.
△ Less
Submitted 31 May, 2020;
originally announced June 2020.
-
BP-DIP: A Backprojection based Deep Image Prior
Authors:
Jenny Zukerman,
Tom Tirer,
Raja Giryes
Abstract:
Deep neural networks are a very powerful tool for many computer vision tasks, including image restoration, exhibiting state-of-the-art results. However, the performance of deep learning methods tends to drop once the observation model used in training mismatches the one in test time. In addition, most deep learning methods require vast amounts of training data, which are not accessible in many app…
▽ More
Deep neural networks are a very powerful tool for many computer vision tasks, including image restoration, exhibiting state-of-the-art results. However, the performance of deep learning methods tends to drop once the observation model used in training mismatches the one in test time. In addition, most deep learning methods require vast amounts of training data, which are not accessible in many applications. To mitigate these disadvantages, we propose to combine two image restoration approaches: (i) Deep Image Prior (DIP), which trains a convolutional neural network (CNN) from scratch in test time using the given degraded image. It does not require any training data and builds on the implicit prior imposed by the CNN architecture; and (ii) a backprojection (BP) fidelity term, which is an alternative to the standard least squares loss that is usually used in previous DIP works. We demonstrate the performance of the proposed method, termed BP-DIP, on the deblurring task and show its advantages over the plain DIP, with both higher PSNR values and better inference run-time.
△ Less
Submitted 30 June, 2020; v1 submitted 11 March, 2020;
originally announced March 2020.
-
Motion Deblurring using Spatiotemporal Phase Aperture Coding
Authors:
Shay Elmalem,
Raja Giryes,
Emanuel Marom
Abstract:
Motion blur is a known issue in photography, as it limits the exposure time while capturing moving objects. Extensive research has been carried to compensate for it. In this work, a computational imaging approach for motion deblurring is proposed and demonstrated. Using dynamic phase-coding in the lens aperture during the image acquisition, the trajectory of the motion is encoded in an intermediat…
▽ More
Motion blur is a known issue in photography, as it limits the exposure time while capturing moving objects. Extensive research has been carried to compensate for it. In this work, a computational imaging approach for motion deblurring is proposed and demonstrated. Using dynamic phase-coding in the lens aperture during the image acquisition, the trajectory of the motion is encoded in an intermediate optical image. This encoding embeds both the motion direction and extent by coloring the spatial blur of each object. The color cues serve as prior information for a blind deblurring process, implemented using a convolutional neural network (CNN) trained to utilize such coding for image restoration. We demonstrate the advantage of the proposed approach over blind-deblurring with no coding and other solutions that use coded acquisition, both in simulation and real-world experiments.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
Correction Filter for Single Image Super-Resolution: Robustifying Off-the-Shelf Deep Super-Resolvers
Authors:
Shady Abu Hussein,
Tom Tirer,
Raja Giryes
Abstract:
The single image super-resolution task is one of the most examined inverse problems in the past decade. In the recent years, Deep Neural Networks (DNNs) have shown superior performance over alternative methods when the acquisition process uses a fixed known downsampling kernel-typically a bicubic kernel. However, several recent works have shown that in practical scenarios, where the test data mism…
▽ More
The single image super-resolution task is one of the most examined inverse problems in the past decade. In the recent years, Deep Neural Networks (DNNs) have shown superior performance over alternative methods when the acquisition process uses a fixed known downsampling kernel-typically a bicubic kernel. However, several recent works have shown that in practical scenarios, where the test data mismatch the training data (e.g. when the downsampling kernel is not the bicubic kernel or is not available at training), the leading DNN methods suffer from a huge performance drop. Inspired by the literature on generalized sampling, in this work we propose a method for improving the performance of DNNs that have been trained with a fixed kernel on observations acquired by other kernels. For a known kernel, we design a closed-form correction filter that modifies the low-resolution image to match one which is obtained by another kernel (e.g. bicubic), and thus improves the results of existing pre-trained DNNs. For an unknown kernel, we extend this idea and propose an algorithm for blind estimation of the required correction filter. We show that our approach outperforms other super-resolution methods, which are designed for general downsampling kernels.
△ Less
Submitted 24 May, 2020; v1 submitted 30 November, 2019;
originally announced December 2019.
-
Deep Radar Detector
Authors:
Daniel Brodeski,
Igal Bilik,
Raja Giryes
Abstract:
While camera and LiDAR processing have been revolutionized since the introduction of deep learning, radar processing still relies on classical tools. In this paper, we introduce a deep learning approach for radar processing, working directly with the radar complex data. To overcome the lack of radar labeled data, we rely in training only on the radar calibration data and introduce new radar augmen…
▽ More
While camera and LiDAR processing have been revolutionized since the introduction of deep learning, radar processing still relies on classical tools. In this paper, we introduce a deep learning approach for radar processing, working directly with the radar complex data. To overcome the lack of radar labeled data, we rely in training only on the radar calibration data and introduce new radar augmentation techniques. We evaluate our method on the radar 4D detection task and demonstrate superior performance compared to the classical approaches while kee** real-time performance. Applying deep learning on radar data has several advantages such as eliminating the need for an expensive radar calibration process each time and enabling classification of the detected objects with almost zero-overhead.
△ Less
Submitted 26 June, 2019;
originally announced June 2019.
-
Image-Adaptive GAN based Reconstruction
Authors:
Shady Abu Hussein,
Tom Tirer,
Raja Giryes
Abstract:
In the recent years, there has been a significant improvement in the quality of samples produced by (deep) generative models such as variational auto-encoders and generative adversarial networks. However, the representation capabilities of these methods still do not capture the full distribution for complex classes of images, such as human faces. This deficiency has been clearly observed in previo…
▽ More
In the recent years, there has been a significant improvement in the quality of samples produced by (deep) generative models such as variational auto-encoders and generative adversarial networks. However, the representation capabilities of these methods still do not capture the full distribution for complex classes of images, such as human faces. This deficiency has been clearly observed in previous works that use pre-trained generative models to solve imaging inverse problems. In this paper, we suggest to mitigate the limited representation capabilities of generators by making them image-adaptive and enforcing compliance of the restoration with the observations via back-projections. We empirically demonstrate the advantages of our proposed approach for image super-resolution and compressed sensing.
△ Less
Submitted 25 November, 2019; v1 submitted 12 June, 2019;
originally announced June 2019.
-
Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited Data
Authors:
Roee Levy Leshem,
Raja Giryes
Abstract:
This paper introduces Taco-VC, a novel architecture for voice conversion based on Tacotron synthesizer, which is a sequence-to-sequence with attention model. The training of multi-speaker voice conversion systems requires a large number of resources, both in training and corpus size. Taco-VC is implemented using a single speaker Tacotron synthesizer based on Phonetic PosteriorGrams (PPGs) and a si…
▽ More
This paper introduces Taco-VC, a novel architecture for voice conversion based on Tacotron synthesizer, which is a sequence-to-sequence with attention model. The training of multi-speaker voice conversion systems requires a large number of resources, both in training and corpus size. Taco-VC is implemented using a single speaker Tacotron synthesizer based on Phonetic PosteriorGrams (PPGs) and a single speaker WaveNet vocoder conditioned on mel spectrograms. To enhance the converted speech quality, and to overcome over-smoothing, the outputs of Tacotron are passed through a novel speechenhancement network, which is composed of a combination of the phoneme recognition and Tacotron networks. Our system is trained just with a single speaker corpus and adapts to new speakers using only a few minutes of training data. Using mid-size public datasets, our method outperforms the baseline in the VCC 2018 SPOKE non-parallel voice conversion task and achieves competitive results compared to multi-speaker networks trained on large private datasets.
△ Less
Submitted 19 June, 2020; v1 submitted 6 April, 2019;
originally announced April 2019.
-
TOP-GAN: Label-Free Cancer Cell Classification Using Deep Learning with a Small Training Set
Authors:
Moran Rubin,
Omer Stein,
Nir A. Turko,
Yoav Nygate,
Darina Roitshtain,
Lidor Karako,
Itay Barnea,
Raja Giryes,
Natan T. Shaked
Abstract:
We propose a new deep learning approach for medical imaging that copes with the problem of a small training set, the main bottleneck of deep learning, and apply it for classification of healthy and cancer cells acquired by quantitative phase imaging. The proposed method, called transferring of pre-trained generative adversarial network (TOP-GAN), is a hybridization between transfer learning and ge…
▽ More
We propose a new deep learning approach for medical imaging that copes with the problem of a small training set, the main bottleneck of deep learning, and apply it for classification of healthy and cancer cells acquired by quantitative phase imaging. The proposed method, called transferring of pre-trained generative adversarial network (TOP-GAN), is a hybridization between transfer learning and generative adversarial networks (GANs). Healthy cells and cancer cells of different metastatic potential have been imaged by low-coherence off-axis holography. After the acquisition, the optical path delay maps of the cells have been extracted and directly used as an input to the deep networks. In order to cope with the small number of classified images, we have used GANs to train a large number of unclassified images from another cell type (sperm cells). After this preliminary training, and after transforming the last layer of the network with new ones, we have designed an automatic classifier for the correct cell type (healthy/primary cancer/metastatic cancer) with 90-99% accuracy, although small training sets of down to several images have been used. These results are better in comparison to other classic methods that aim at co** with the same problem of a small training set. We believe that our approach makes the combination of holographic microscopy and deep learning networks more accessible to the medical field by enabling a rapid, automatic and accurate classification in stain-free imaging flow cytometry. Furthermore, our approach is expected to be applicable to many other medical image classification tasks, suffering from a small training set.
△ Less
Submitted 17 December, 2018;
originally announced December 2018.
-
A Greedy Approach to $\ell_{0,\infty}$ Based Convolutional Sparse Coding
Authors:
Elad Plaut,
Raja Giryes
Abstract:
Sparse coding techniques for image processing traditionally rely on a processing of small overlap** patches separately followed by averaging. This has the disadvantage that the reconstructed image no longer obeys the sparsity prior used in the processing. For this purpose convolutional sparse coding has been introduced, where a shift-invariant dictionary is used and the sparsity of the recovered…
▽ More
Sparse coding techniques for image processing traditionally rely on a processing of small overlap** patches separately followed by averaging. This has the disadvantage that the reconstructed image no longer obeys the sparsity prior used in the processing. For this purpose convolutional sparse coding has been introduced, where a shift-invariant dictionary is used and the sparsity of the recovered image is maintained. Most such strategies target the $\ell_0$ "norm" or the $\ell_1$ norm of the whole image, which may create an imbalanced sparsity across various regions in the image. In order to face this challenge, the $\ell_{0,\infty}$ "norm" has been proposed as an alternative that "operates locally while thinking globally". The approaches taken for tackling the non-convexity of these optimization problems have been either using a convex relaxation or local pursuit algorithms. In this paper, we present an efficient greedy method for sparse coding and dictionary learning, which is specifically tailored to $\ell_{0,\infty}$, and is based on matching pursuit. We demonstrate the usage of our approach in salt-and-pepper noise removal and image inpainting. A code package which reproduces the experiments presented in this work is available at https://web.eng.tau.ac.il/~raja
△ Less
Submitted 26 December, 2018;
originally announced December 2018.
-
DeepISP: Towards Learning an End-to-End Image Processing Pipeline
Authors:
Eli Schwartz,
Raja Giryes,
Alex M. Bronstein
Abstract:
We present DeepISP, a full end-to-end deep neural model of the camera image signal processing (ISP) pipeline. Our model learns a map** from the raw low-light mosaiced image to the final visually compelling image and encompasses low-level tasks such as demosaicing and denoising as well as higher-level tasks such as color correction and image adjustment. The training and evaluation of the pipeline…
▽ More
We present DeepISP, a full end-to-end deep neural model of the camera image signal processing (ISP) pipeline. Our model learns a map** from the raw low-light mosaiced image to the final visually compelling image and encompasses low-level tasks such as demosaicing and denoising as well as higher-level tasks such as color correction and image adjustment. The training and evaluation of the pipeline were performed on a dedicated dataset containing pairs of low-light and well-lit images captured by a Samsung S7 smartphone camera in both raw and processed JPEG formats. The proposed solution achieves state-of-the-art performance in objective evaluation of PSNR on the subtask of joint denoising and demosaicing. For the full end-to-end pipeline, it achieves better visual quality compared to the manufacturer ISP, in both a subjective human assessment and when rated by a deep model trained for assessing image quality.
△ Less
Submitted 3 February, 2019; v1 submitted 20 January, 2018;
originally announced January 2018.
-
Learned Convolutional Sparse Coding
Authors:
Hillel Sreter,
Raja Giryes
Abstract:
We propose a convolutional recurrent sparse auto-encoder model. The model consists of a sparse encoder, which is a convolutional extension of the learned ISTA (LISTA) method, and a linear convolutional decoder. Our strategy offers a simple method for learning a task-driven sparse convolutional dictionary (CD), and producing an approximate convolutional sparse code (CSC) over the learned dictionary…
▽ More
We propose a convolutional recurrent sparse auto-encoder model. The model consists of a sparse encoder, which is a convolutional extension of the learned ISTA (LISTA) method, and a linear convolutional decoder. Our strategy offers a simple method for learning a task-driven sparse convolutional dictionary (CD), and producing an approximate convolutional sparse code (CSC) over the learned dictionary. We trained the model to minimize reconstruction loss via gradient decent with back-propagation and have achieved competitive results to KSVD image denoising and to leading CSC methods in image inpainting requiring only a small fraction of their run-time.
△ Less
Submitted 17 January, 2020; v1 submitted 1 November, 2017;
originally announced November 2017.