-
High Fidelity Synthetic Face Generation for Rosacea Skin Condition from Limited Data
Authors:
Anwesha Mohanty,
Alistair Sutherland,
Marija Bezbradica,
Hossein Javidnia
Abstract:
Similar to the majority of deep learning applications, diagnosing skin diseases using computer vision and deep learning often requires a large volume of data. However, obtaining sufficient data for particular types of facial skin conditions can be difficult due to privacy concerns. As a result, conditions like Rosacea are often understudied in computer-aided diagnosis. The limited availability of…
▽ More
Similar to the majority of deep learning applications, diagnosing skin diseases using computer vision and deep learning often requires a large volume of data. However, obtaining sufficient data for particular types of facial skin conditions can be difficult due to privacy concerns. As a result, conditions like Rosacea are often understudied in computer-aided diagnosis. The limited availability of data for facial skin conditions has led to the investigation of alternative methods for computer-aided diagnosis. In recent years, Generative Adversarial Networks (GANs), mainly variants of StyleGANs, have demonstrated promising results in generating synthetic facial images. In this study, for the first time, a small dataset of Rosacea with 300 full-face images is utilized to further investigate the possibility of generating synthetic data. The preliminary experiments show how fine-tuning the model and varying experimental settings significantly affect the fidelity of the Rosacea features. It is demonstrated that $R_1$ Regularization strength helps achieve high-fidelity details. Additionally, this study presents qualitative evaluations of synthetic/generated faces by expert dermatologists and non-specialist participants. The quantitative evaluation is presented using a few validation metric(s). Furthermore a number of limitations and future directions are discussed. Code and generated dataset are available at: \url{https://github.com/thinkercache/stylegan2-ada-pytorch}
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
An Advert Creation System for 3D Product Placements
Authors:
Ivan Bacher,
Hossein Javidnia,
Soumyabrata Dev,
Rahul Agrahari,
Murhaf Hossari,
Matthew Nicholson,
Clare Conran,
Jian Tang,
Peng Song,
David Corrigan,
François Pitié
Abstract:
Over the past decade, the evolution of video-sharing platforms has attracted a significant amount of investments on contextual advertising. The common contextual advertising platforms utilize the information provided by users to integrate 2D visual ads into videos. The existing platforms face many technical challenges such as ad integration with respect to occluding objects and 3D ad placement. Th…
▽ More
Over the past decade, the evolution of video-sharing platforms has attracted a significant amount of investments on contextual advertising. The common contextual advertising platforms utilize the information provided by users to integrate 2D visual ads into videos. The existing platforms face many technical challenges such as ad integration with respect to occluding objects and 3D ad placement. This paper presents a Video Advertisement Placement & Integration (Adverts) framework, which is capable of perceiving the 3D geometry of the scene and camera motion to blend 3D virtual objects in videos and create the illusion of reality. The proposed framework contains several modules such as monocular depth estimation, object segmentation, background-foreground separation, alpha matting and camera tracking. Our experiments conducted using Adverts framework indicates the significant potential of this system in contextual ad integration, and pushing the limits of advertising industry using mixed reality technologies.
△ Less
Submitted 26 June, 2020;
originally announced June 2020.
-
Methodology for Building Synthetic Datasets with Virtual Humans
Authors:
Shubhajit Basak,
Hossein Javidnia,
Faisal Khan,
Rachel McDonnell,
Michael Schukat
Abstract:
Recent advances in deep learning methods have increased the performance of face detection and recognition systems. The accuracy of these models relies on the range of variation provided in the training data. Creating a dataset that represents all variations of real-world faces is not feasible as the control over the quality of the data decreases with the size of the dataset. Repeatability of data…
▽ More
Recent advances in deep learning methods have increased the performance of face detection and recognition systems. The accuracy of these models relies on the range of variation provided in the training data. Creating a dataset that represents all variations of real-world faces is not feasible as the control over the quality of the data decreases with the size of the dataset. Repeatability of data is another challenge as it is not possible to exactly recreate 'real-world' acquisition conditions outside of the laboratory. In this work, we explore a framework to synthetically generate facial data to be used as part of a toolchain to generate very large facial datasets with a high degree of control over facial and environmental variations. Such large datasets can be used for improved, targeted training of deep neural networks. In particular, we make use of a 3D morphable face model for the rendering of multiple 2D images across a dataset of 100 synthetic identities, providing full control over image variations such as pose, illumination, and background.
△ Less
Submitted 21 June, 2020;
originally announced June 2020.
-
Background Matting
Authors:
Hossein Javidnia,
François Pitié
Abstract:
The current state of the art alpha matting methods mainly rely on the trimap as the secondary and only guidance to estimate alpha. This paper investigates the effects of utilising the background information as well as trimap in the process of alpha calculation. To achieve this goal, a state of the art method, AlphaGan is adopted and modified to process the background information as an extra input…
▽ More
The current state of the art alpha matting methods mainly rely on the trimap as the secondary and only guidance to estimate alpha. This paper investigates the effects of utilising the background information as well as trimap in the process of alpha calculation. To achieve this goal, a state of the art method, AlphaGan is adopted and modified to process the background information as an extra input channel. Extensive experiments are performed to analyse the effect of the background information in image and video matting such as training with mildly and heavily distorted backgrounds. Based on the quantitative evaluations performed on Adobe Composition-1k dataset, the proposed pipeline significantly outperforms the state of the art methods using AlphaMatting benchmark metrics.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Identifying Candidate Spaces for Advert Implantation
Authors:
Soumyabrata Dev,
Hossein Javidnia,
Murhaf Hossari,
Matthew Nicholson,
Killian McCabe,
Atul Nautiyal,
Clare Conran,
Jian Tang,
Wei Xu,
François Pitié
Abstract:
Virtual advertising is an important and promising feature in the area of online advertising. It involves integrating adverts onto live or recorded videos for product placements and targeted advertisements. Such integration of adverts is primarily done by video editors in the post-production stage, which is cumbersome and time-consuming. Therefore, it is important to automatically identify candidat…
▽ More
Virtual advertising is an important and promising feature in the area of online advertising. It involves integrating adverts onto live or recorded videos for product placements and targeted advertisements. Such integration of adverts is primarily done by video editors in the post-production stage, which is cumbersome and time-consuming. Therefore, it is important to automatically identify candidate spaces in a video frame, wherein new adverts can be implanted. The candidate space should match the scene perspective, and also have a high quality of experience according to human subjective judgment. In this paper, we propose the use of a bespoke neural net that can assist the video editors in identifying candidate spaces. We benchmark our approach against several deep-learning architectures on a large-scale image dataset of candidate spaces of outdoor scenes. Our work is the first of its kind in this area of multimedia and augmented reality applications, and achieves the best results.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Versatile Auxiliary Classifier with Generative Adversarial Network (VAC+GAN)
Authors:
Shabab Bazrafkan,
Hossein Javidnia,
Peter Corcoran
Abstract:
One of the most interesting challenges in Artificial Intelligence is to train conditional generators which are able to provide labeled adversarial samples drawn from a specific distribution. In this work, a new framework is presented to train a deep conditional generator by placing a classifier in parallel with the discriminator and back propagate the classification error through the generator net…
▽ More
One of the most interesting challenges in Artificial Intelligence is to train conditional generators which are able to provide labeled adversarial samples drawn from a specific distribution. In this work, a new framework is presented to train a deep conditional generator by placing a classifier in parallel with the discriminator and back propagate the classification error through the generator network. The method is versatile and is applicable to any variations of Generative Adversarial Network (GAN) implementation, and also gives superior results compared to similar methods.
△ Less
Submitted 18 June, 2018; v1 submitted 1 May, 2018;
originally announced May 2018.
-
The Application of Preconditioned Alternating Direction Method of Multipliers in Depth from Focal Stack
Authors:
Hossein Javidnia,
Peter Corcoran
Abstract:
Post capture refocusing effect in smartphone cameras is achievable by using focal stacks. However, the accuracy of this effect is totally dependent on the combination of the depth layers in the stack. The accuracy of the extended depth of field effect in this application can be improved significantly by computing an accurate depth map which has been an open issue for decades. To tackle this issue,…
▽ More
Post capture refocusing effect in smartphone cameras is achievable by using focal stacks. However, the accuracy of this effect is totally dependent on the combination of the depth layers in the stack. The accuracy of the extended depth of field effect in this application can be improved significantly by computing an accurate depth map which has been an open issue for decades. To tackle this issue, in this paper, a framework is proposed based on Preconditioned Alternating Direction Method of Multipliers (PADMM) for depth from the focal stack and synthetic defocus application. In addition to its ability to provide high structural accuracy and occlusion handling, the optimization function of the proposed method can, in fact, converge faster and better than state of the art methods. The evaluation has been done on 21 sets of focal stacks and the optimization function has been compared against 5 other methods. Preliminary results indicate that the proposed method has a better performance in terms of structural accuracy and optimization in comparison to the current state of the art methods.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
Total Variation-Based Dense Depth from Multi-Camera Array
Authors:
Hossein Javidnia,
Peter Corcoran
Abstract:
Multi-Camera arrays are increasingly employed in both consumer and industrial applications, and various passive techniques are documented to estimate depth from such camera arrays. Current depth estimation methods provide useful estimations of depth in an imaged scene but are often impractical due to significant computational requirements. This paper presents a novel framework that generates a hig…
▽ More
Multi-Camera arrays are increasingly employed in both consumer and industrial applications, and various passive techniques are documented to estimate depth from such camera arrays. Current depth estimation methods provide useful estimations of depth in an imaged scene but are often impractical due to significant computational requirements. This paper presents a novel framework that generates a high-quality continuous depth map from multi-camera array/light field cameras. The proposed framework utilizes analysis of the local Epipolar Plane Image (EPI) to initiate the depth estimation process. The estimated depth map is then processed using Total Variation (TV) minimization based on the Fenchel-Rockafellar duality. Evaluation of this method based on a well-known benchmark indicates that the proposed framework performs well in terms of accuracy when compared to the top-ranked depth estimation methods and a baseline algorithm. The test dataset includes both photorealistic and non-photorealistic scenes. Notably, the computational requirements required to achieve an equivalent accuracy are significantly reduced when compared to the top algorithms. As a consequence, the proposed framework is suitable for deployment in consumer and industrial applications.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
Depth from Monocular Images using a Semi-Parallel Deep Neural Network (SPDNN) Hybrid Architecture
Authors:
S. Bazrafkan,
H. Javidnia,
J. Lemley,
P. Corcoran
Abstract:
Deep neural networks are applied to a wide range of problems in recent years. In this work, Convolutional Neural Network (CNN) is applied to the problem of determining the depth from a single camera image (monocular depth). Eight different networks are designed to perform depth estimation, each of them suitable for a feature level. Networks with different pooling sizes determine different feature…
▽ More
Deep neural networks are applied to a wide range of problems in recent years. In this work, Convolutional Neural Network (CNN) is applied to the problem of determining the depth from a single camera image (monocular depth). Eight different networks are designed to perform depth estimation, each of them suitable for a feature level. Networks with different pooling sizes determine different feature levels. After designing a set of networks, these models may be combined into a single network topology using graph optimization techniques. This "Semi Parallel Deep Neural Network (SPDNN)" eliminates duplicated common network layers, and can be further optimized by retraining to achieve an improved model compared to the individual topologies. In this study, four SPDNN models are trained and have been evaluated at 2 stages on the KITTI dataset. The ground truth images in the first part of the experiment are provided by the benchmark, and for the second part, the ground truth images are the depth map results from applying a state-of-the-art stereo matching method. The results of this evaluation demonstrate that using post-processing techniques to refine the target of the network increases the accuracy of depth estimation on individual mono images. The second evaluation shows that using segmentation data alongside the original data as the input can improve the depth estimation results to a point where performance is comparable with stereo depth estimation. The computational time is also discussed in this study.
△ Less
Submitted 18 April, 2018; v1 submitted 10 March, 2017;
originally announced March 2017.