-
Towards Realistic Landmark-Guided Facial Video Inpainting Based on GANs
Authors:
Fatemeh Ghorbani Lohesara,
Karen Egiazarian,
Sebastian Knorr
Abstract:
Facial video inpainting plays a crucial role in a wide range of applications, including but not limited to the removal of obstructions in video conferencing and telemedicine, enhancement of facial expression analysis, privacy protection, integration of graphical overlays, and virtual makeup. This domain presents serious challenges due to the intricate nature of facial features and the inherent hum…
▽ More
Facial video inpainting plays a crucial role in a wide range of applications, including but not limited to the removal of obstructions in video conferencing and telemedicine, enhancement of facial expression analysis, privacy protection, integration of graphical overlays, and virtual makeup. This domain presents serious challenges due to the intricate nature of facial features and the inherent human familiarity with faces, heightening the need for accurate and persuasive completions. In addressing challenges specifically related to occlusion removal in this context, our focus is on the progressive task of generating complete images from facial data covered by masks, ensuring both spatial and temporal coherence. Our study introduces a network designed for expression-based video inpainting, employing generative adversarial networks (GANs) to handle static and moving occlusions across all frames. By utilizing facial landmarks and an occlusion-free reference image, our model maintains the user's identity consistently across frames. We further enhance emotional preservation through a customized facial expression recognition (FER) loss function, ensuring detailed inpainted outputs. Our proposed framework exhibits proficiency in eliminating occlusions from facial videos in an adaptive form, whether appearing static or dynamic on the frames, while providing realistic and coherent results.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Expression-aware video inpainting for HMD removal in XR applications
Authors:
Fatemeh Ghorbani Lohesara,
Karen Egiazarian,
Sebastian Knorr
Abstract:
Head-mounted displays (HMDs) serve as indispensable devices for observing extended reality (XR) environments and virtual content. However, HMDs present an obstacle to external recording techniques as they block the upper face of the user. This limitation significantly affects social XR applications, specifically teleconferencing, where facial features and eye gaze information play a vital role in…
▽ More
Head-mounted displays (HMDs) serve as indispensable devices for observing extended reality (XR) environments and virtual content. However, HMDs present an obstacle to external recording techniques as they block the upper face of the user. This limitation significantly affects social XR applications, specifically teleconferencing, where facial features and eye gaze information play a vital role in creating an immersive user experience. In this study, we propose a new network for expression-aware video inpainting for HMD removal (EVI-HRnet) based on generative adversarial networks (GANs). Our model effectively fills in missing information with regard to facial landmarks and a single occlusion-free reference image of the user. The framework and its components ensure the preservation of the user's identity across frames using the reference frame. To further improve the level of realism of the inpainted output, we introduce a novel facial expression recognition (FER) loss function for emotion preservation. Our results demonstrate the remarkable capability of the proposed framework to remove HMDs from facial videos while maintaining the subject's facial expression and identity. Moreover, the outputs exhibit temporal consistency along the inpainted frames. This lightweight framework presents a practical approach for HMD occlusion removal, with the potential to enhance various collaborative XR applications without the need for additional hardware.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Residual Swin Transformer Channel Attention Network for Image Demosaicing
Authors:
Wenzhu Xing,
Karen Egiazarian
Abstract:
Image demosaicing is problem of interpolating full- resolution color images from raw sensor (color filter array) data. During last decade, deep neural networks have been widely used in image restoration, and in particular, in demosaicing, attaining significant performance improvement. In recent years, vision transformers have been designed and successfully used in various computer vision applicati…
▽ More
Image demosaicing is problem of interpolating full- resolution color images from raw sensor (color filter array) data. During last decade, deep neural networks have been widely used in image restoration, and in particular, in demosaicing, attaining significant performance improvement. In recent years, vision transformers have been designed and successfully used in various computer vision applications. One of the recent methods of image restoration based on a Swin Transformer (ST), SwinIR, demonstrates state-of-the-art performance with a smaller number of parameters than neural network-based methods. Inspired by the success of SwinIR, we propose in this paper a novel Swin Transformer-based network for image demosaicing, called RSTCANet. To extract image features, RSTCANet stacks several residual Swin Transformer Channel Attention blocks (RSTCAB), introducing the channel attention for each two successive ST blocks. Extensive experiments demonstrate that RSTCANet out- performs state-of-the-art image demosaicing methods, and has a smaller number of parameters.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
Unfolding-Aided Bootstrapped Phase Retrieval in Optical Imaging
Authors:
Samuel Pinilla,
Kumar Vijay Mishra,
Igor Shevkunov,
Mojtaba Soltanalian,
Vladimir Katkovnik,
Karen Egiazarian
Abstract:
Phase retrieval in optical imaging refers to the recovery of a complex signal from phaseless data acquired in the form of its diffraction patterns. These patterns are acquired through a system with a coherent light source that employs a diffractive optical element (DOE) to modulate the scene resulting in coded diffraction patterns at the sensor. Recently, the hybrid approach of model-driven networ…
▽ More
Phase retrieval in optical imaging refers to the recovery of a complex signal from phaseless data acquired in the form of its diffraction patterns. These patterns are acquired through a system with a coherent light source that employs a diffractive optical element (DOE) to modulate the scene resulting in coded diffraction patterns at the sensor. Recently, the hybrid approach of model-driven network or deep unfolding has emerged as an effective alternative to conventional model-based and learning-based phase retrieval techniques because it allows for bounding the complexity of algorithms while also retaining their efficacy. Additionally, such hybrid approaches have shown promise in improving the design of DOEs that follow theoretical uniqueness conditions. There are opportunities to exploit novel experimental setups and resolve even more complex DOE phase retrieval applications. This paper presents an overview of algorithms and applications of deep unfolding for bootstrapped - regardless of near, middle, and far zones - phase retrieval.
△ Less
Submitted 9 October, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Learning-based Noise Component Map Estimation for Image Denoising
Authors:
Sheyda Ghanbaralizadeh Bahnemiri,
Mykola Ponomarenko,
Karen Egiazarian
Abstract:
A problem of image denoising when images are corrupted by a non-stationary noise is considered in this paper. Since in practice no a priori information on noise is available, noise statistics should be pre-estimated for image denoising. In this paper, deep convolutional neural network (CNN) based method for estimation of a map of local, patch-wise, standard deviations of noise (so-called sigma-map…
▽ More
A problem of image denoising when images are corrupted by a non-stationary noise is considered in this paper. Since in practice no a priori information on noise is available, noise statistics should be pre-estimated for image denoising. In this paper, deep convolutional neural network (CNN) based method for estimation of a map of local, patch-wise, standard deviations of noise (so-called sigma-map) is proposed. It achieves the state-of-the-art performance in accuracy of estimation of sigma-map for the case of non-stationary noise, as well as estimation of noise variance for the case of additive white Gaussian noise. Extensive experiments on image denoising using estimated sigma-maps demonstrate that our method outperforms recent CNN-based blind image denoising methods by up to 6 dB in PSNR, as well as other state-of-the-art methods based on sigma-map estimation by up to 0.5 dB, providing same time better usage flexibility. Comparison with the ideal case, when denoising is applied using ground-truth sigma-map, shows that a difference of corresponding PSNR values for most of noise levels is within 0.1-0.2 dB and does not exceeds 0.6 dB.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Flashlight CNN Image Denoising
Authors:
Pham Huu Thanh Binh,
Cristóvão Cruz,
Karen Egiazarian
Abstract:
This paper proposes a learning-based denoising method called FlashLight CNN (FLCNN) that implements a deep neural network for image denoising. The proposed approach is based on deep residual networks and inception networks and it is able to leverage many more parameters than residual networks alone for denoising grayscale images corrupted by additive white Gaussian noise (AWGN). FlashLight CNN dem…
▽ More
This paper proposes a learning-based denoising method called FlashLight CNN (FLCNN) that implements a deep neural network for image denoising. The proposed approach is based on deep residual networks and inception networks and it is able to leverage many more parameters than residual networks alone for denoising grayscale images corrupted by additive white Gaussian noise (AWGN). FlashLight CNN demonstrates state of the art performance when compared quantitatively and visually with the current state of the art image denoising methods.
△ Less
Submitted 2 July, 2020; v1 submitted 2 March, 2020;
originally announced March 2020.
-
The Practicality of Stochastic Optimization in Imaging Inverse Problems
Authors:
Junqi Tang,
Karen Egiazarian,
Mohammad Golbabaee,
Mike Davies
Abstract:
In this work we investigate the practicality of stochastic gradient descent and recently introduced variants with variance-reduction techniques in imaging inverse problems. Such algorithms have been shown in the machine learning literature to have optimal complexities in theory, and provide great improvement empirically over the deterministic gradient methods. Surprisingly, in some tasks such as i…
▽ More
In this work we investigate the practicality of stochastic gradient descent and recently introduced variants with variance-reduction techniques in imaging inverse problems. Such algorithms have been shown in the machine learning literature to have optimal complexities in theory, and provide great improvement empirically over the deterministic gradient methods. Surprisingly, in some tasks such as image deblurring, many of such methods fail to converge faster than the accelerated deterministic gradient methods, even in terms of epoch counts. We investigate this phenomenon and propose a theory-inspired mechanism for the practitioners to efficiently characterize whether it is beneficial for an inverse problem to be solved by stochastic optimization techniques or not. Using standard tools in numerical linear algebra, we derive conditions on the spectral structure of the inverse problem for being a suitable application of stochastic gradient methods. Particularly, we show that, for an imaging inverse problem, if and only if its Hessain matrix has a fast-decaying eigenspectrum, then the stochastic gradient methods can be more advantageous than deterministic methods for solving such a problem. Our results also provide guidance on choosing appropriately the partition minibatch schemes, showing that a good minibatch scheme typically has relatively low correlation within each of the minibatches. Finally, we propose an accelerated primal-dual SGD algorithm in order to tackle another key bottleneck of stochastic optimization which is the heavy computation of proximal operators. The proposed method has fast convergence rate in practice, and is able to efficiently handle non-smooth regularization terms which are coupled with linear operators.
△ Less
Submitted 8 November, 2019; v1 submitted 22 October, 2019;
originally announced October 2019.
-
Hyperspectral holography and spectroscopy: computational features of inverse discrete cosine transform
Authors:
Vladimir Katkovnik,
Igor Shevkunov,
Karen Egiazarian
Abstract:
Broadband hyperspectral digital holography and Fourier transform spectroscopy are important instruments in various science and application fields. In the digital hyperspectral holography and spectroscopy the variable of interest are obtained as inverse discrete cosine transforms of observed diffractive intensity patterns. In these notes, we provide a variety of algorithms for the inverse cosine tr…
▽ More
Broadband hyperspectral digital holography and Fourier transform spectroscopy are important instruments in various science and application fields. In the digital hyperspectral holography and spectroscopy the variable of interest are obtained as inverse discrete cosine transforms of observed diffractive intensity patterns. In these notes, we provide a variety of algorithms for the inverse cosine transform with the proofs of perfect spectrum reconstruction, as well as we discuss and illustrate some nontrivial features of these algorithms.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Nonlocality-Reinforced Convolutional Neural Networks for Image Denoising
Authors:
Cristóvão Cruz,
Alessandro Foi,
Vladimir Katkovnik,
Karen Egiazarian
Abstract:
We introduce a paradigm for nonlocal sparsity reinforced deep convolutional neural network denoising. It is a combination of a local multiscale denoising by a convolutional neural network (CNN) based denoiser and a nonlocal denoising based on a nonlocal filter (NLF) exploiting the mutual similarities between groups of patches. CNN models are leveraged with noise levels that progressively decrease…
▽ More
We introduce a paradigm for nonlocal sparsity reinforced deep convolutional neural network denoising. It is a combination of a local multiscale denoising by a convolutional neural network (CNN) based denoiser and a nonlocal denoising based on a nonlocal filter (NLF) exploiting the mutual similarities between groups of patches. CNN models are leveraged with noise levels that progressively decrease at every iteration of our framework, while their output is regularized by a nonlocal prior implicit within the NLF. Unlike complicated neural networks that embed the nonlocality prior within the layers of the network, our framework is modular, it uses standard pre-trained CNNs together with standard nonlocal filters. An instance of the proposed framework, called NN3D, is evaluated over large grayscale image datasets showing state-of-the-art performance.
△ Less
Submitted 21 June, 2018; v1 submitted 6 March, 2018;
originally announced March 2018.
-
Blind estimation of white Gaussian noise variance in highly textured images
Authors:
Mykola Ponomarenko,
Nikolay Gapon,
Viacheslav Voronin,
Karen Egiazarian
Abstract:
In the paper, a new method of blind estimation of noise variance in a single highly textured image is proposed. An input image is divided into 8x8 blocks and discrete cosine transform (DCT) is performed for each block. A part of 64 DCT coefficients with lowest energy calculated through all blocks is selected for further analysis. For the DCT coefficients, a robust estimate of noise variance is cal…
▽ More
In the paper, a new method of blind estimation of noise variance in a single highly textured image is proposed. An input image is divided into 8x8 blocks and discrete cosine transform (DCT) is performed for each block. A part of 64 DCT coefficients with lowest energy calculated through all blocks is selected for further analysis. For the DCT coefficients, a robust estimate of noise variance is calculated. Corresponding to the obtained estimate, a part of blocks having very large values of local variance calculated only for the selected DCT coefficients are excluded from the further analysis. These two steps (estimation of noise variance and exclusion of blocks) are iteratively repeated three times. For the verification of the proposed method, a new noise-free test image database TAMPERE17 consisting of many highly textured images is designed. It is shown for this database and different values of noise variance from the set {25, 49, 100, 225}, that the proposed method provides approximately two times lower estimation root mean square error than other methods.
△ Less
Submitted 29 November, 2017;
originally announced November 2017.
-
Statistical evaluation of visual quality metrics for image denoising
Authors:
Karen Egiazarian,
Mykola Ponomarenko,
Vladimir Lukin,
Oleg Ieremeiem
Abstract:
This paper studies the problem of full reference visual quality assessment of denoised images with a special emphasis on images with low contrast and noise-like texture. Denoising of such images together with noise removal often results in image details loss or smoothing. A new test image database, FLT, containing 75 noise-free "reference" images and 300 filtered ("distorted") images is developed.…
▽ More
This paper studies the problem of full reference visual quality assessment of denoised images with a special emphasis on images with low contrast and noise-like texture. Denoising of such images together with noise removal often results in image details loss or smoothing. A new test image database, FLT, containing 75 noise-free "reference" images and 300 filtered ("distorted") images is developed. Each reference image, corrupted by an additive white Gaussian noise, is denoised by the BM3D filter with four different values of threshold parameter (four levels of noise suppression). After carrying out a perceptual quality assessment of distorted images, the mean opinion scores (MOS) are obtained and compared with the values of known full reference quality metrics. As a result, the Spearman Rank Order Correlation Coefficient (SROCC) between PSNR values and MOS has a value close to zero, and SROCC between values of known full-reference image visual quality metrics and MOS does not exceed 0.82 (which is reached by a new visual quality metric proposed in this paper). The FLT dataset is more complex than earlier datasets used for assessment of visual quality for image denoising. Thus, it can be effectively used to design new image visual quality metrics for image denoising.
△ Less
Submitted 2 November, 2017;
originally announced November 2017.
-
Complex-valued image denosing based on group-wise complex-domain sparsity
Authors:
Vladimir Katkovnik,
Mykola Ponomarenko,
Karen Egiazarian
Abstract:
Phase imaging and wavefront reconstruction from noisy observations of complex exponent is a topic of this paper. It is a highly non-linear problem because the exponent is a 2π-periodic function of phase. The reconstruction of phase and amplitude is difficult. Even with an additive Gaussian noise in observations distributions of noisy components in phase and amplitude are signal dependent and non-G…
▽ More
Phase imaging and wavefront reconstruction from noisy observations of complex exponent is a topic of this paper. It is a highly non-linear problem because the exponent is a 2π-periodic function of phase. The reconstruction of phase and amplitude is difficult. Even with an additive Gaussian noise in observations distributions of noisy components in phase and amplitude are signal dependent and non-Gaussian. Additional difficulties follow from a prior unknown correlation of phase and amplitude in real life scenarios. In this paper, we propose a new class of non-iterative and iterative complex domain filters based on group-wise sparsity in complex domain. This sparsity is based on the techniques implemented in Block-Matching 3D filtering (BM3D) and 3D/4D High-Order Singular Decomposition (HOSVD) exploited for spectrum design, analysis and filtering. The introduced algorithms are a generalization of the ideas used in the CD-BM3D algorithms presented in our previous publications. The algorithms are implemented as a MATLAB Toolbox. The efficiency of the algorithms is demonstrated by simulation tests.
△ Less
Submitted 1 November, 2017;
originally announced November 2017.
-
Single Image Super-Resolution based on Wiener Filter in Similarity Domain
Authors:
Cristóvão Cruz,
Rakesh Mehta,
Vladimir Katkovnik,
Karen Egiazarian
Abstract:
Single image super resolution (SISR) is an ill-posed problem aiming at estimating a plausible high resolution (HR) image from a single low resolution (LR) image. Current state-of-the-art SISR methods are patch-based. They use either external data or internal self-similarity to learn a prior for a HR image. External data based methods utilize large number of patches from the training data, while se…
▽ More
Single image super resolution (SISR) is an ill-posed problem aiming at estimating a plausible high resolution (HR) image from a single low resolution (LR) image. Current state-of-the-art SISR methods are patch-based. They use either external data or internal self-similarity to learn a prior for a HR image. External data based methods utilize large number of patches from the training data, while self-similarity based approaches leverage one or more similar patches from the input image. In this paper we propose a self-similarity based approach that is able to use large groups of similar patches extracted from the input image to solve the SISR problem. We introduce a novel prior leading to collaborative filtering of patch groups in 1D similarity domain and couple it with an iterative back-projection framework. The performance of the proposed algorithm is evaluated on a number of SISR benchmark datasets. Without using any external data, the proposed approach outperforms the current non-CNN based methods on the tested datasets for various scaling factors. On certain datasets, the gain is over 1 dB, when compared to the recent method A+. For high sampling rate (x4) the proposed method performs similarly to very recent state-of-the-art deep convolutional network based approaches.
△ Less
Submitted 29 November, 2017; v1 submitted 13 April, 2017;
originally announced April 2017.
-
Fast Recursive Coding Based on Grou** of Symbols
Authors:
Nikolay Ponomarenko,
Vladimir Lukin,
Karen Egiazarian,
Jaakko Astola,
Boris Y Ryabko
Abstract:
A novel fast recursive coding technique is proposed. It operates with only integer values not longer 8 bits and is multiplication free. Recursion the algorithm is based on indirectly provides rather effective coding of symbols for very large alphabets. The code length for the proposed technique can be up to 20-30% less than for arithmetic coding and, in the worst case it is only by 1-3% larger.
A novel fast recursive coding technique is proposed. It operates with only integer values not longer 8 bits and is multiplication free. Recursion the algorithm is based on indirectly provides rather effective coding of symbols for very large alphabets. The code length for the proposed technique can be up to 20-30% less than for arithmetic coding and, in the worst case it is only by 1-3% larger.
△ Less
Submitted 21 August, 2007;
originally announced August 2007.
-
Fast Codes for Large Alphabets
Authors:
Boris Ryabko,
Jaakko Astola,
Karen Egiazarian
Abstract:
We address the problem of constructing a fast lossless code in the case when the source alphabet is large. The main idea of the new scheme may be described as follows. We group letters with small probabilities in subsets (acting as super letters) and use time consuming coding for these subsets only, whereas letters in the subsets have the same code length and therefore can be coded fast. The des…
▽ More
We address the problem of constructing a fast lossless code in the case when the source alphabet is large. The main idea of the new scheme may be described as follows. We group letters with small probabilities in subsets (acting as super letters) and use time consuming coding for these subsets only, whereas letters in the subsets have the same code length and therefore can be coded fast. The described scheme can be applied to sources with known and unknown statistics.
△ Less
Submitted 2 April, 2005;
originally announced April 2005.