-
To Beta or Not To Beta: Information Bottleneck for DigitaL Image Forensics
Authors:
Aurobrata Ghosh,
Zheng Zhong,
Steve Cruz,
Subbu Veeravasarapu,
Terrance E Boult,
Maneesh Singh
Abstract:
We consider an information theoretic approach to address the problem of identifying fake digital images. We propose an innovative method to formulate the issue of localizing manipulated regions in an image as a deep representation learning problem using the Information Bottleneck (IB), which has recently gained popularity as a framework for interpreting deep neural networks. Tampered images pose a…
▽ More
We consider an information theoretic approach to address the problem of identifying fake digital images. We propose an innovative method to formulate the issue of localizing manipulated regions in an image as a deep representation learning problem using the Information Bottleneck (IB), which has recently gained popularity as a framework for interpreting deep neural networks. Tampered images pose a serious predicament since digitized media is a ubiquitous part of our lives. These are facilitated by the easy availability of image editing software and aggravated by recent advances in deep generative models such as GANs. We propose InfoPrint, a computationally efficient solution to the IB formulation using approximate variational inference and compare it to a numerical solution that is computationally expensive. Testing on a number of standard datasets, we demonstrate that InfoPrint outperforms the state-of-the-art and the numerical solution. Additionally, it also has the ability to detect alterations made by inpainting GANs.
△ Less
Submitted 11 August, 2019;
originally announced August 2019.
-
Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders
Authors:
Ananya Harsh Jha,
Saket Anand,
Maneesh Singh,
V. S. R. Veeravasarapu
Abstract:
Generative models that learn disentangled representations for different factors of variation in an image can be very useful for targeted data augmentation. By sampling from the disentangled latent subspace of interest, we can efficiently generate new data necessary for a particular task. Learning disentangled representations is a challenging problem, especially when certain factors of variation ar…
▽ More
Generative models that learn disentangled representations for different factors of variation in an image can be very useful for targeted data augmentation. By sampling from the disentangled latent subspace of interest, we can efficiently generate new data necessary for a particular task. Learning disentangled representations is a challenging problem, especially when certain factors of variation are difficult to label. In this paper, we introduce a novel architecture that disentangles the latent space into two complementary subspaces by using only weak supervision in form of pairwise similarity labels. Inspired by the recent success of cycle-consistent adversarial architectures, we use cycle-consistency in a variational auto-encoder framework. Our non-adversarial approach is in contrast with the recent works that combine adversarial training with auto-encoders to disentangle representations. We show compelling results of disentangled latent subspaces on three datasets and compare with recent works that leverage adversarial training.
△ Less
Submitted 27 April, 2018;
originally announced April 2018.
-
Adversarially Tuned Scene Generation
Authors:
V S R Veeravasarapu,
Constantin Rothkopf,
Ramesh Visvanathan
Abstract:
Generalization performance of trained computer vision systems that use computer graphics (CG) generated data is not yet effective due to the concept of 'domain-shift' between virtual and real data. Although simulated data augmented with a few real world samples has been shown to mitigate domain shift and improve transferability of trained models, guiding or bootstrap** the virtual data generatio…
▽ More
Generalization performance of trained computer vision systems that use computer graphics (CG) generated data is not yet effective due to the concept of 'domain-shift' between virtual and real data. Although simulated data augmented with a few real world samples has been shown to mitigate domain shift and improve transferability of trained models, guiding or bootstrap** the virtual data generation with the distributions learnt from target real world domain is desired, especially in the fields where annotating even few real images is laborious (such as semantic labeling, and intrinsic images etc.). In order to address this problem in an unsupervised manner, our work combines recent advances in CG (which aims to generate stochastic scene layouts coupled with large collections of 3D object models) and generative adversarial training (which aims train generative models by measuring discrepancy between generated and real data in terms of their separability in the space of a deep discriminatively-trained classifier). Our method uses iterative estimation of the posterior density of prior distributions for a generative graphical model. This is done within a rejection sampling framework. Initially, we assume uniform distributions as priors on the parameters of a scene described by a generative graphical model. As iterations proceed the prior distributions get updated to distributions that are closer to the (unknown) distributions of target data. We demonstrate the utility of adversarially tuned scene generation on two real-world benchmark datasets (CityScapes and CamVid) for traffic scene semantic labeling with a deep convolutional net (DeepLab). We realized performance improvements by 2.28 and 3.14 points (using the IoU metric) between the DeepLab models trained on simulated sets prepared from the scene generation models before and after tuning to CityScapes and CamVid respectively.
△ Less
Submitted 7 July, 2017; v1 submitted 2 January, 2017;
originally announced January 2017.
-
Model-driven Simulations for Deep Convolutional Neural Networks
Authors:
V S R Veeravasarapu,
Constantin Rothkopf,
Visvanathan Ramesh
Abstract:
The use of simulated virtual environments to train deep convolutional neural networks (CNN) is a currently active practice to reduce the (real)data-hungriness of the deep CNN models, especially in application domains in which large scale real data and/or groundtruth acquisition is difficult or laborious. Recent approaches have attempted to harness the capabilities of existing video games, animated…
▽ More
The use of simulated virtual environments to train deep convolutional neural networks (CNN) is a currently active practice to reduce the (real)data-hungriness of the deep CNN models, especially in application domains in which large scale real data and/or groundtruth acquisition is difficult or laborious. Recent approaches have attempted to harness the capabilities of existing video games, animated movies to provide training data with high precision groundtruth. However, a stumbling block is in how one can certify generalization of the learned models and their usefulness in real world data sets. This opens up fundamental questions such as: What is the role of photorealism of graphics simulations in training CNN models? Are the trained models valid in reality? What are possible ways to reduce the performance bias? In this work, we begin to address theses issues systematically in the context of urban semantic understanding with CNNs. Towards this end, we (a) propose a simple probabilistic urban scene model, (b) develop a parametric rendering tool to synthesize the data with groundtruth, followed by (c) a systematic exploration of the impact of level-of-realism on the generality of the trained CNN model to real world; and domain adaptation concepts to minimize the performance bias.
△ Less
Submitted 31 May, 2016;
originally announced May 2016.
-
Cardiac Motion Analysis by Temporal Flow Graphs
Authors:
V S R Veeravasarapu,
Jayanthi Sivaswamy,
Vishanji Karani
Abstract:
Cardiac motion analysis from B-mode ultrasound sequence is a key task in assessing the health of the heart. The paper proposes a new methodology for cardiac motion analysis based on the temporal behaviour of points of interest on the myocardium. We define a new signal called the Temporal Flow Graph (TFG) which depicts the movement of a point of interest over time. It is a graphical representation…
▽ More
Cardiac motion analysis from B-mode ultrasound sequence is a key task in assessing the health of the heart. The paper proposes a new methodology for cardiac motion analysis based on the temporal behaviour of points of interest on the myocardium. We define a new signal called the Temporal Flow Graph (TFG) which depicts the movement of a point of interest over time. It is a graphical representation derived from a flow field and describes the temporal evolution of a point. We prove that TFG for an object undergoing periodic motion is also periodic. This principle can be utilized to derive both global and local information from a given sequence. We demonstrate this for detecting motion irregularities at the sequence, as well as regional levels on real and synthetic data. A coarse localisation of anatomical landmarks such as centres of left/right cavities and valve points is also demonstrated using TFGs.
△ Less
Submitted 23 April, 2016;
originally announced April 2016.
-
Model Validation for Vision Systems via Graphics Simulation
Authors:
V S R Veeravasarapu,
Rudra Narayan Hota,
Constantin Rothkopf,
Ramesh Visvanathan
Abstract:
Rapid advances in computation, combined with latest advances in computer graphics simulations have facilitated the development of vision systems and training them in virtual environments. One major stumbling block is in certification of the designs and tuned parameters of these systems to work in real world. In this paper, we begin to explore the fundamental question: Which type of information tra…
▽ More
Rapid advances in computation, combined with latest advances in computer graphics simulations have facilitated the development of vision systems and training them in virtual environments. One major stumbling block is in certification of the designs and tuned parameters of these systems to work in real world. In this paper, we begin to explore the fundamental question: Which type of information transfer is more analogous to real world? Inspired from the performance characterization methodology outlined in the 90's, we note that insights derived from simulations can be qualitative or quantitative depending on the degree of the fidelity of models used in simulations and the nature of the questions posed by the experimenter. We adapt the methodology in the context of current graphics simulation tools for modeling data generation processes and, for systematic performance characterization and trade-off analysis for vision system design leading to qualitative and quantitative insights. In concrete, we examine invariance assumptions used in vision algorithms for video surveillance settings as a case study and assess the degree to which those invariance assumptions deviate as a function of contextual variables on both graphics simulations and in real data. As computer graphics rendering quality improves, we believe teasing apart the degree to which model assumptions are valid via systematic graphics simulation can be a significant aid to assisting more principled ways of approaching vision system design and performance modeling.
△ Less
Submitted 4 December, 2015;
originally announced December 2015.
-
Simulations for Validation of Vision Systems
Authors:
V S R Veeravasarapu,
Rudra Narayan Hota,
Constantin Rothkopf,
Ramesh Visvanathan
Abstract:
As the computer vision matures into a systems science and engineering discipline, there is a trend in leveraging latest advances in computer graphics simulations for performance evaluation, learning, and inference. However, there is an open question on the utility of graphics simulations for vision with apparently contradicting views in the literature. In this paper, we place the results from the…
▽ More
As the computer vision matures into a systems science and engineering discipline, there is a trend in leveraging latest advances in computer graphics simulations for performance evaluation, learning, and inference. However, there is an open question on the utility of graphics simulations for vision with apparently contradicting views in the literature. In this paper, we place the results from the recent literature in the context of performance characterization methodology outlined in the 90's and note that insights derived from simulations can be qualitative or quantitative depending on the degree of fidelity of models used in simulation and the nature of the question posed by the experimenter. We describe a simulation platform that incorporates latest graphics advances and use it for systematic performance characterization and trade-off analysis for vision system design. We verify the utility of the platform in a case study of validating a generative model inspired vision hypothesis, Rank-Order consistency model, in the contexts of global and local illumination changes, and bad weather, and high-frequency noise. Our approach establishes the link between alternative viewpoints, involving models with physics based semantics and signal and perturbation semantics and confirms insights in literature on robust change detection.
△ Less
Submitted 3 December, 2015;
originally announced December 2015.