Search | arXiv e-print repository

Towards Realistic Landmark-Guided Facial Video Inpainting Based on GANs

Authors: Fatemeh Ghorbani Lohesara, Karen Egiazarian, Sebastian Knorr

Abstract: Facial video inpainting plays a crucial role in a wide range of applications, including but not limited to the removal of obstructions in video conferencing and telemedicine, enhancement of facial expression analysis, privacy protection, integration of graphical overlays, and virtual makeup. This domain presents serious challenges due to the intricate nature of facial features and the inherent hum… ▽ More Facial video inpainting plays a crucial role in a wide range of applications, including but not limited to the removal of obstructions in video conferencing and telemedicine, enhancement of facial expression analysis, privacy protection, integration of graphical overlays, and virtual makeup. This domain presents serious challenges due to the intricate nature of facial features and the inherent human familiarity with faces, heightening the need for accurate and persuasive completions. In addressing challenges specifically related to occlusion removal in this context, our focus is on the progressive task of generating complete images from facial data covered by masks, ensuring both spatial and temporal coherence. Our study introduces a network designed for expression-based video inpainting, employing generative adversarial networks (GANs) to handle static and moving occlusions across all frames. By utilizing facial landmarks and an occlusion-free reference image, our model maintains the user's identity consistently across frames. We further enhance emotional preservation through a customized facial expression recognition (FER) loss function, ensuring detailed inpainted outputs. Our proposed framework exhibits proficiency in eliminating occlusions from facial videos in an adaptive form, whether appearing static or dynamic on the frames, while providing realistic and coherent results. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Accepted in Electronic Imaging 2024

arXiv:2401.14136 [pdf, other]

doi 10.1145/3626495.3626497

Expression-aware video inpainting for HMD removal in XR applications

Authors: Fatemeh Ghorbani Lohesara, Karen Egiazarian, Sebastian Knorr

Abstract: Head-mounted displays (HMDs) serve as indispensable devices for observing extended reality (XR) environments and virtual content. However, HMDs present an obstacle to external recording techniques as they block the upper face of the user. This limitation significantly affects social XR applications, specifically teleconferencing, where facial features and eye gaze information play a vital role in… ▽ More Head-mounted displays (HMDs) serve as indispensable devices for observing extended reality (XR) environments and virtual content. However, HMDs present an obstacle to external recording techniques as they block the upper face of the user. This limitation significantly affects social XR applications, specifically teleconferencing, where facial features and eye gaze information play a vital role in creating an immersive user experience. In this study, we propose a new network for expression-aware video inpainting for HMD removal (EVI-HRnet) based on generative adversarial networks (GANs). Our model effectively fills in missing information with regard to facial landmarks and a single occlusion-free reference image of the user. The framework and its components ensure the preservation of the user's identity across frames using the reference frame. To further improve the level of realism of the inpainted output, we introduce a novel facial expression recognition (FER) loss function for emotion preservation. Our results demonstrate the remarkable capability of the proposed framework to remove HMDs from facial videos while maintaining the subject's facial expression and identity. Moreover, the outputs exhibit temporal consistency along the inpainted frames. This lightweight framework presents a practical approach for HMD occlusion removal, with the potential to enhance various collaborative XR applications without the need for additional hardware. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: Accepted in CVMP 2023

arXiv:2204.07098 [pdf, other]

Residual Swin Transformer Channel Attention Network for Image Demosaicing

Authors: Wenzhu Xing, Karen Egiazarian

Abstract: Image demosaicing is problem of interpolating full- resolution color images from raw sensor (color filter array) data. During last decade, deep neural networks have been widely used in image restoration, and in particular, in demosaicing, attaining significant performance improvement. In recent years, vision transformers have been designed and successfully used in various computer vision applicati… ▽ More Image demosaicing is problem of interpolating full- resolution color images from raw sensor (color filter array) data. During last decade, deep neural networks have been widely used in image restoration, and in particular, in demosaicing, attaining significant performance improvement. In recent years, vision transformers have been designed and successfully used in various computer vision applications. One of the recent methods of image restoration based on a Swin Transformer (ST), SwinIR, demonstrates state-of-the-art performance with a smaller number of parameters than neural network-based methods. Inspired by the success of SwinIR, we propose in this paper a novel Swin Transformer-based network for image demosaicing, called RSTCANet. To extract image features, RSTCANet stacks several residual Swin Transformer Channel Attention blocks (RSTCAB), introducing the channel attention for each two successive ST blocks. Extensive experiments demonstrate that RSTCANet out- performs state-of-the-art image demosaicing methods, and has a smaller number of parameters. △ Less

Submitted 14 April, 2022; originally announced April 2022.

arXiv:2203.01695 [pdf, other]

Unfolding-Aided Bootstrapped Phase Retrieval in Optical Imaging

Authors: Samuel Pinilla, Kumar Vijay Mishra, Igor Shevkunov, Mojtaba Soltanalian, Vladimir Katkovnik, Karen Egiazarian

Abstract: Phase retrieval in optical imaging refers to the recovery of a complex signal from phaseless data acquired in the form of its diffraction patterns. These patterns are acquired through a system with a coherent light source that employs a diffractive optical element (DOE) to modulate the scene resulting in coded diffraction patterns at the sensor. Recently, the hybrid approach of model-driven networ… ▽ More Phase retrieval in optical imaging refers to the recovery of a complex signal from phaseless data acquired in the form of its diffraction patterns. These patterns are acquired through a system with a coherent light source that employs a diffractive optical element (DOE) to modulate the scene resulting in coded diffraction patterns at the sensor. Recently, the hybrid approach of model-driven network or deep unfolding has emerged as an effective alternative to conventional model-based and learning-based phase retrieval techniques because it allows for bounding the complexity of algorithms while also retaining their efficacy. Additionally, such hybrid approaches have shown promise in improving the design of DOEs that follow theoretical uniqueness conditions. There are opportunities to exploit novel experimental setups and resolve even more complex DOE phase retrieval applications. This paper presents an overview of algorithms and applications of deep unfolding for bootstrapped - regardless of near, middle, and far zones - phase retrieval. △ Less

Submitted 9 October, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: 13 pages, 11 figures, 1 table

arXiv:2109.11877 [pdf, other]

Learning-based Noise Component Map Estimation for Image Denoising

Authors: Sheyda Ghanbaralizadeh Bahnemiri, Mykola Ponomarenko, Karen Egiazarian

Abstract: A problem of image denoising when images are corrupted by a non-stationary noise is considered in this paper. Since in practice no a priori information on noise is available, noise statistics should be pre-estimated for image denoising. In this paper, deep convolutional neural network (CNN) based method for estimation of a map of local, patch-wise, standard deviations of noise (so-called sigma-map… ▽ More A problem of image denoising when images are corrupted by a non-stationary noise is considered in this paper. Since in practice no a priori information on noise is available, noise statistics should be pre-estimated for image denoising. In this paper, deep convolutional neural network (CNN) based method for estimation of a map of local, patch-wise, standard deviations of noise (so-called sigma-map) is proposed. It achieves the state-of-the-art performance in accuracy of estimation of sigma-map for the case of non-stationary noise, as well as estimation of noise variance for the case of additive white Gaussian noise. Extensive experiments on image denoising using estimated sigma-maps demonstrate that our method outperforms recent CNN-based blind image denoising methods by up to 6 dB in PSNR, as well as other state-of-the-art methods based on sigma-map estimation by up to 0.5 dB, providing same time better usage flexibility. Comparison with the ideal case, when denoising is applied using ground-truth sigma-map, shows that a difference of corresponding PSNR values for most of noise levels is within 0.1-0.2 dB and does not exceeds 0.6 dB. △ Less

Submitted 24 September, 2021; originally announced September 2021.

Comments: 5 pages, submitted to IEEE Signal Processing Letters

arXiv:2003.00762 [pdf, other]

Flashlight CNN Image Denoising

Authors: Pham Huu Thanh Binh, Cristóvão Cruz, Karen Egiazarian

Abstract: This paper proposes a learning-based denoising method called FlashLight CNN (FLCNN) that implements a deep neural network for image denoising. The proposed approach is based on deep residual networks and inception networks and it is able to leverage many more parameters than residual networks alone for denoising grayscale images corrupted by additive white Gaussian noise (AWGN). FlashLight CNN dem… ▽ More This paper proposes a learning-based denoising method called FlashLight CNN (FLCNN) that implements a deep neural network for image denoising. The proposed approach is based on deep residual networks and inception networks and it is able to leverage many more parameters than residual networks alone for denoising grayscale images corrupted by additive white Gaussian noise (AWGN). FlashLight CNN demonstrates state of the art performance when compared quantitatively and visually with the current state of the art image denoising methods. △ Less

Submitted 2 July, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

arXiv:1910.10100 [pdf, ps, other]

doi 10.1109/TCI.2020.3032101

The Practicality of Stochastic Optimization in Imaging Inverse Problems

Authors: Junqi Tang, Karen Egiazarian, Mohammad Golbabaee, Mike Davies

Abstract: In this work we investigate the practicality of stochastic gradient descent and recently introduced variants with variance-reduction techniques in imaging inverse problems. Such algorithms have been shown in the machine learning literature to have optimal complexities in theory, and provide great improvement empirically over the deterministic gradient methods. Surprisingly, in some tasks such as i… ▽ More In this work we investigate the practicality of stochastic gradient descent and recently introduced variants with variance-reduction techniques in imaging inverse problems. Such algorithms have been shown in the machine learning literature to have optimal complexities in theory, and provide great improvement empirically over the deterministic gradient methods. Surprisingly, in some tasks such as image deblurring, many of such methods fail to converge faster than the accelerated deterministic gradient methods, even in terms of epoch counts. We investigate this phenomenon and propose a theory-inspired mechanism for the practitioners to efficiently characterize whether it is beneficial for an inverse problem to be solved by stochastic optimization techniques or not. Using standard tools in numerical linear algebra, we derive conditions on the spectral structure of the inverse problem for being a suitable application of stochastic gradient methods. Particularly, we show that, for an imaging inverse problem, if and only if its Hessain matrix has a fast-decaying eigenspectrum, then the stochastic gradient methods can be more advantageous than deterministic methods for solving such a problem. Our results also provide guidance on choosing appropriately the partition minibatch schemes, showing that a good minibatch scheme typically has relatively low correlation within each of the minibatches. Finally, we propose an accelerated primal-dual SGD algorithm in order to tackle another key bottleneck of stochastic optimization which is the heavy computation of proximal operators. The proposed method has fast convergence rate in practice, and is able to efficiently handle non-smooth regularization terms which are coupled with linear operators. △ Less

Submitted 8 November, 2019; v1 submitted 22 October, 2019; originally announced October 2019.

Journal ref: Published in IEEE Transactions on Computational Imaging, Vol. 6, 2020

arXiv:1910.03013 [pdf, ps, other]

Hyperspectral holography and spectroscopy: computational features of inverse discrete cosine transform

Authors: Vladimir Katkovnik, Igor Shevkunov, Karen Egiazarian

Abstract: Broadband hyperspectral digital holography and Fourier transform spectroscopy are important instruments in various science and application fields. In the digital hyperspectral holography and spectroscopy the variable of interest are obtained as inverse discrete cosine transforms of observed diffractive intensity patterns. In these notes, we provide a variety of algorithms for the inverse cosine tr… ▽ More Broadband hyperspectral digital holography and Fourier transform spectroscopy are important instruments in various science and application fields. In the digital hyperspectral holography and spectroscopy the variable of interest are obtained as inverse discrete cosine transforms of observed diffractive intensity patterns. In these notes, we provide a variety of algorithms for the inverse cosine transform with the proofs of perfect spectrum reconstruction, as well as we discuss and illustrate some nontrivial features of these algorithms. △ Less

Submitted 4 October, 2019; originally announced October 2019.

Comments: 20 pages, 9 figures

arXiv:1803.02112 [pdf, other]

doi 10.1109/LSP.2018.2850222

Nonlocality-Reinforced Convolutional Neural Networks for Image Denoising

Authors: Cristóvão Cruz, Alessandro Foi, Vladimir Katkovnik, Karen Egiazarian

Abstract: We introduce a paradigm for nonlocal sparsity reinforced deep convolutional neural network denoising. It is a combination of a local multiscale denoising by a convolutional neural network (CNN) based denoiser and a nonlocal denoising based on a nonlocal filter (NLF) exploiting the mutual similarities between groups of patches. CNN models are leveraged with noise levels that progressively decrease… ▽ More We introduce a paradigm for nonlocal sparsity reinforced deep convolutional neural network denoising. It is a combination of a local multiscale denoising by a convolutional neural network (CNN) based denoiser and a nonlocal denoising based on a nonlocal filter (NLF) exploiting the mutual similarities between groups of patches. CNN models are leveraged with noise levels that progressively decrease at every iteration of our framework, while their output is regularized by a nonlocal prior implicit within the NLF. Unlike complicated neural networks that embed the nonlocality prior within the layers of the network, our framework is modular, it uses standard pre-trained CNNs together with standard nonlocal filters. An instance of the proposed framework, called NN3D, is evaluated over large grayscale image datasets showing state-of-the-art performance. △ Less

Submitted 21 June, 2018; v1 submitted 6 March, 2018; originally announced March 2018.

Comments: Accepted for publication in IEEE SPL

arXiv:1711.10792 [pdf, other]

Blind estimation of white Gaussian noise variance in highly textured images

Authors: Mykola Ponomarenko, Nikolay Gapon, Viacheslav Voronin, Karen Egiazarian

Abstract: In the paper, a new method of blind estimation of noise variance in a single highly textured image is proposed. An input image is divided into 8x8 blocks and discrete cosine transform (DCT) is performed for each block. A part of 64 DCT coefficients with lowest energy calculated through all blocks is selected for further analysis. For the DCT coefficients, a robust estimate of noise variance is cal… ▽ More In the paper, a new method of blind estimation of noise variance in a single highly textured image is proposed. An input image is divided into 8x8 blocks and discrete cosine transform (DCT) is performed for each block. A part of 64 DCT coefficients with lowest energy calculated through all blocks is selected for further analysis. For the DCT coefficients, a robust estimate of noise variance is calculated. Corresponding to the obtained estimate, a part of blocks having very large values of local variance calculated only for the selected DCT coefficients are excluded from the further analysis. These two steps (estimation of noise variance and exclusion of blocks) are iteratively repeated three times. For the verification of the proposed method, a new noise-free test image database TAMPERE17 consisting of many highly textured images is designed. It is shown for this database and different values of noise variance from the set {25, 49, 100, 225}, that the proposed method provides approximately two times lower estimation root mean square error than other methods. △ Less

Submitted 29 November, 2017; originally announced November 2017.

Comments: IS&T International Symposium on Electronic Imaging (EI 2018), Image Processing: Algorithms and Systems XVI

arXiv:1711.00693 [pdf, other]

Statistical evaluation of visual quality metrics for image denoising

Authors: Karen Egiazarian, Mykola Ponomarenko, Vladimir Lukin, Oleg Ieremeiem

Abstract: This paper studies the problem of full reference visual quality assessment of denoised images with a special emphasis on images with low contrast and noise-like texture. Denoising of such images together with noise removal often results in image details loss or smoothing. A new test image database, FLT, containing 75 noise-free "reference" images and 300 filtered ("distorted") images is developed.… ▽ More This paper studies the problem of full reference visual quality assessment of denoised images with a special emphasis on images with low contrast and noise-like texture. Denoising of such images together with noise removal often results in image details loss or smoothing. A new test image database, FLT, containing 75 noise-free "reference" images and 300 filtered ("distorted") images is developed. Each reference image, corrupted by an additive white Gaussian noise, is denoised by the BM3D filter with four different values of threshold parameter (four levels of noise suppression). After carrying out a perceptual quality assessment of distorted images, the mean opinion scores (MOS) are obtained and compared with the values of known full reference quality metrics. As a result, the Spearman Rank Order Correlation Coefficient (SROCC) between PSNR values and MOS has a value close to zero, and SROCC between values of known full-reference image visual quality metrics and MOS does not exceed 0.82 (which is reached by a new visual quality metric proposed in this paper). The FLT dataset is more complex than earlier datasets used for assessment of visual quality for image denoising. Thus, it can be effectively used to design new image visual quality metrics for image denoising. △ Less

Submitted 2 November, 2017; originally announced November 2017.

Comments: Submitted to ICASSP 2018

arXiv:1711.00362 [pdf, other]

Complex-valued image denosing based on group-wise complex-domain sparsity

Authors: Vladimir Katkovnik, Mykola Ponomarenko, Karen Egiazarian

Abstract: Phase imaging and wavefront reconstruction from noisy observations of complex exponent is a topic of this paper. It is a highly non-linear problem because the exponent is a 2π-periodic function of phase. The reconstruction of phase and amplitude is difficult. Even with an additive Gaussian noise in observations distributions of noisy components in phase and amplitude are signal dependent and non-G… ▽ More Phase imaging and wavefront reconstruction from noisy observations of complex exponent is a topic of this paper. It is a highly non-linear problem because the exponent is a 2π-periodic function of phase. The reconstruction of phase and amplitude is difficult. Even with an additive Gaussian noise in observations distributions of noisy components in phase and amplitude are signal dependent and non-Gaussian. Additional difficulties follow from a prior unknown correlation of phase and amplitude in real life scenarios. In this paper, we propose a new class of non-iterative and iterative complex domain filters based on group-wise sparsity in complex domain. This sparsity is based on the techniques implemented in Block-Matching 3D filtering (BM3D) and 3D/4D High-Order Singular Decomposition (HOSVD) exploited for spectrum design, analysis and filtering. The introduced algorithms are a generalization of the ideas used in the CD-BM3D algorithms presented in our previous publications. The algorithms are implemented as a MATLAB Toolbox. The efficiency of the algorithms is demonstrated by simulation tests. △ Less

Submitted 1 November, 2017; originally announced November 2017.

Comments: Submitted to Signal Processing

arXiv:1704.04126 [pdf, other]

doi 10.1109/TIP.2017.2779265

Single Image Super-Resolution based on Wiener Filter in Similarity Domain

Authors: Cristóvão Cruz, Rakesh Mehta, Vladimir Katkovnik, Karen Egiazarian

Abstract: Single image super resolution (SISR) is an ill-posed problem aiming at estimating a plausible high resolution (HR) image from a single low resolution (LR) image. Current state-of-the-art SISR methods are patch-based. They use either external data or internal self-similarity to learn a prior for a HR image. External data based methods utilize large number of patches from the training data, while se… ▽ More Single image super resolution (SISR) is an ill-posed problem aiming at estimating a plausible high resolution (HR) image from a single low resolution (LR) image. Current state-of-the-art SISR methods are patch-based. They use either external data or internal self-similarity to learn a prior for a HR image. External data based methods utilize large number of patches from the training data, while self-similarity based approaches leverage one or more similar patches from the input image. In this paper we propose a self-similarity based approach that is able to use large groups of similar patches extracted from the input image to solve the SISR problem. We introduce a novel prior leading to collaborative filtering of patch groups in 1D similarity domain and couple it with an iterative back-projection framework. The performance of the proposed algorithm is evaluated on a number of SISR benchmark datasets. Without using any external data, the proposed approach outperforms the current non-CNN based methods on the tested datasets for various scaling factors. On certain datasets, the gain is over 1 dB, when compared to the recent method A+. For high sampling rate (x4) the proposed method performs similarly to very recent state-of-the-art deep convolutional network based approaches. △ Less

Submitted 29 November, 2017; v1 submitted 13 April, 2017; originally announced April 2017.

Comments: Paper accepted for publication on IEEE Transactions on Image Processing

arXiv:0708.2893 [pdf]

Fast Recursive Coding Based on Grou** of Symbols

Authors: Nikolay Ponomarenko, Vladimir Lukin, Karen Egiazarian, Jaakko Astola, Boris Y Ryabko

Abstract: A novel fast recursive coding technique is proposed. It operates with only integer values not longer 8 bits and is multiplication free. Recursion the algorithm is based on indirectly provides rather effective coding of symbols for very large alphabets. The code length for the proposed technique can be up to 20-30% less than for arithmetic coding and, in the worst case it is only by 1-3% larger. A novel fast recursive coding technique is proposed. It operates with only integer values not longer 8 bits and is multiplication free. Recursion the algorithm is based on indirectly provides rather effective coding of symbols for very large alphabets. The code length for the proposed technique can be up to 20-30% less than for arithmetic coding and, in the worst case it is only by 1-3% larger. △ Less

Submitted 21 August, 2007; originally announced August 2007.

Comments: 3 pages, submitted to IEEE Transactions on Information Theory

ACM Class: E.4

arXiv:cs/0504005 [pdf, ps, other]

Fast Codes for Large Alphabets

Authors: Boris Ryabko, Jaakko Astola, Karen Egiazarian

Abstract: We address the problem of constructing a fast lossless code in the case when the source alphabet is large. The main idea of the new scheme may be described as follows. We group letters with small probabilities in subsets (acting as super letters) and use time consuming coding for these subsets only, whereas letters in the subsets have the same code length and therefore can be coded fast. The des… ▽ More We address the problem of constructing a fast lossless code in the case when the source alphabet is large. The main idea of the new scheme may be described as follows. We group letters with small probabilities in subsets (acting as super letters) and use time consuming coding for these subsets only, whereas letters in the subsets have the same code length and therefore can be coded fast. The described scheme can be applied to sources with known and unknown statistics. △ Less

Submitted 2 April, 2005; originally announced April 2005.

Comments: published

Showing 1–15 of 15 results for author: Egiazarian, K