-
Latent Intrinsics Emerge from Training to Relight
Authors:
Xiao Zhang,
William Gao,
Seemandhar Jain,
Michael Maire,
David. A. Forsyth,
Anand Bhattad
Abstract:
Image relighting is the task of showing what a scene from a source image would look like if illuminated differently. Inverse graphics schemes recover an explicit representation of geometry and a set of chosen intrinsics, then relight with some form of renderer. However error control for inverse graphics is difficult, and inverse graphics methods can represent only the effects of the chosen intrins…
▽ More
Image relighting is the task of showing what a scene from a source image would look like if illuminated differently. Inverse graphics schemes recover an explicit representation of geometry and a set of chosen intrinsics, then relight with some form of renderer. However error control for inverse graphics is difficult, and inverse graphics methods can represent only the effects of the chosen intrinsics. This paper describes a relighting method that is entirely data-driven, where intrinsics and lighting are each represented as latent variables. Our approach produces SOTA relightings of real scenes, as measured by standard metrics. We show that albedo can be recovered from our latent intrinsics without using any example albedos, and that the albedos recovered are competitive with SOTA methods.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
OPSurv: Orthogonal Polynomials Quadrature Algorithm for Survival Analysis
Authors:
Lilian W. Bialokozowicz,
Hoang M. Le,
Tristan Sylvain,
Peter A. I. Forsyth,
Vineel Nagisetty,
Greg Mori
Abstract:
This paper introduces the Orthogonal Polynomials Quadrature Algorithm for Survival Analysis (OPSurv), a new method providing time-continuous functional outputs for both single and competing risks scenarios in survival analysis. OPSurv utilizes the initial zero condition of the Cumulative Incidence function and a unique decomposition of probability densities using orthogonal polynomials, allowing i…
▽ More
This paper introduces the Orthogonal Polynomials Quadrature Algorithm for Survival Analysis (OPSurv), a new method providing time-continuous functional outputs for both single and competing risks scenarios in survival analysis. OPSurv utilizes the initial zero condition of the Cumulative Incidence function and a unique decomposition of probability densities using orthogonal polynomials, allowing it to learn functional approximation coefficients for each risk event and construct Cumulative Incidence Function estimates via Gauss--Legendre quadrature. This approach effectively counters overfitting, particularly in competing risks scenarios, enhancing model expressiveness and control. The paper further details empirical validations and theoretical justifications of OPSurv, highlighting its robust performance as an advancement in survival analysis with competing risks.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now
Authors:
Ayush Sarkar,
Hanlin Mai,
Amitabh Mahapatra,
Svetlana Lazebnik,
D. A. Forsyth,
Anand Bhattad
Abstract:
Generative models can produce impressively realistic images. This paper demonstrates that generated images have geometric features different from those of real images. We build a set of collections of generated images, prequalified to fool simple, signal-based classifiers into believing they are real. We then show that prequalified generated images can be identified reliably by classifiers that on…
▽ More
Generative models can produce impressively realistic images. This paper demonstrates that generated images have geometric features different from those of real images. We build a set of collections of generated images, prequalified to fool simple, signal-based classifiers into believing they are real. We then show that prequalified generated images can be identified reliably by classifiers that only look at geometric properties. We use three such classifiers. All three classifiers are denied access to image pixels, and look only at derived geometric features. The first classifier looks at the perspective field of the image, the second looks at lines detected in the image, and the third looks at relations between detected objects and shadows. Our procedure detects generated images more reliably than SOTA local signal based detectors, for images from a number of distinct generators. Saliency maps suggest that the classifiers can identify geometric problems reliably. We conclude that current generators cannot reliably reproduce geometric properties of real images.
△ Less
Submitted 30 May, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
StyleGAN knows Normal, Depth, Albedo, and More
Authors:
Anand Bhattad,
Daniel McKee,
Derek Hoiem,
D. A. Forsyth
Abstract:
Intrinsic images, in the original sense, are image-like maps of scene properties like depth, normal, albedo or shading. This paper demonstrates that StyleGAN can easily be induced to produce intrinsic images. The procedure is straightforward. We show that, if StyleGAN produces $G({w})$ from latents ${w}$, then for each type of intrinsic image, there is a fixed offset ${d}_c$ so that…
▽ More
Intrinsic images, in the original sense, are image-like maps of scene properties like depth, normal, albedo or shading. This paper demonstrates that StyleGAN can easily be induced to produce intrinsic images. The procedure is straightforward. We show that, if StyleGAN produces $G({w})$ from latents ${w}$, then for each type of intrinsic image, there is a fixed offset ${d}_c$ so that $G({w}+{d}_c)$ is that type of intrinsic image for $G({w})$. Here ${d}_c$ is {\em independent of ${w}$}. The StyleGAN we used was pretrained by others, so this property is not some accident of our training regime. We show that there are image transformations StyleGAN will {\em not} produce in this fashion, so StyleGAN is not a generic image regression engine.
It is conceptually exciting that an image generator should ``know'' and represent intrinsic images. There may also be practical advantages to using a generative model to produce intrinsic images. The intrinsic images obtained from StyleGAN compare well both qualitatively and quantitatively with those obtained by using SOTA image regression techniques; but StyleGAN's intrinsic images are robust to relighting effects, unlike SOTA methods.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Make It So: Steering StyleGAN for Any Image Inversion and Editing
Authors:
Anand Bhattad,
Viraj Shah,
Derek Hoiem,
D. A. Forsyth
Abstract:
StyleGAN's disentangled style representation enables powerful image editing by manipulating the latent variables, but accurately map** real-world images to their latent variables (GAN inversion) remains a challenge. Existing GAN inversion methods struggle to maintain editing directions and produce realistic results.
To address these limitations, we propose Make It So, a novel GAN inversion met…
▽ More
StyleGAN's disentangled style representation enables powerful image editing by manipulating the latent variables, but accurately map** real-world images to their latent variables (GAN inversion) remains a challenge. Existing GAN inversion methods struggle to maintain editing directions and produce realistic results.
To address these limitations, we propose Make It So, a novel GAN inversion method that operates in the $\mathcal{Z}$ (noise) space rather than the typical $\mathcal{W}$ (latent style) space. Make It So preserves editing capabilities, even for out-of-domain images. This is a crucial property that was overlooked in prior methods. Our quantitative evaluations demonstrate that Make It So outperforms the state-of-the-art method PTI~\cite{roich2021pivotal} by a factor of five in inversion accuracy and achieves ten times better edit quality for complex indoor scenes.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Neural Network Approach to Portfolio Optimization with Leverage Constraints:a Case Study on High Inflation Investment
Authors:
Chendi Ni,
Yuying Li,
Peter A. Forsyth
Abstract:
Motivated by the current global high inflation scenario, we aim to discover a dynamic multi-period allocation strategy to optimally outperform a passive benchmark while adhering to a bounded leverage limit. To this end, we formulate an optimal control problem to outperform a benchmark portfolio throughout the investment horizon. Assuming the asset prices follow the jump-diffusion model during high…
▽ More
Motivated by the current global high inflation scenario, we aim to discover a dynamic multi-period allocation strategy to optimally outperform a passive benchmark while adhering to a bounded leverage limit. To this end, we formulate an optimal control problem to outperform a benchmark portfolio throughout the investment horizon. Assuming the asset prices follow the jump-diffusion model during high inflation periods, we first establish a closed-form solution for the optimal strategy that outperforms a passive strategy under the cumulative quadratic tracking difference (CD) objective, assuming continuous trading and no bankruptcy. To obtain strategies under the bounded leverage constraint among other realistic constraints, we then propose a novel leverage-feasible neural network (LFNN) to represent control, which converts the original constrained optimization problem into an unconstrained optimization problem that is computationally feasible with standard optimization methods. We establish mathematically that the LFNN approximation can yield a solution that is arbitrarily close to the solution of the original optimal control problem with bounded leverage. We further apply the LFNN approach to a four-asset investment scenario with bootstrap resampled asset returns from the filtered high inflation regime data. The LFNN strategy is shown to consistently outperform the passive benchmark strategy by about 200 bps (median annualized return), with a greater than 90% probability of outperforming the benchmark at the end of the investment horizon.
△ Less
Submitted 24 May, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Sustainable Wireless Services with UAV Swarms Tailored to Renewable Energy Sources
Authors:
Igor Donevski,
Marco Virgili,
Nithin Babu,
Jimmy Jessen Nielsen,
Andrew J. Forsyth,
Constantinos B. Papadias,
Petar Popovski
Abstract:
Unmanned Aerial Vehicle (UAV) swarms are often required in off-grid scenarios, such as disaster-struck, war-torn or rural areas, where the UAVs have no access to the power grid and instead rely on renewable energy. Considering a main battery fed from two renewable sources, wind and solar, we scale such a system based on the financial budget, environmental characteristics, and seasonal variations.…
▽ More
Unmanned Aerial Vehicle (UAV) swarms are often required in off-grid scenarios, such as disaster-struck, war-torn or rural areas, where the UAVs have no access to the power grid and instead rely on renewable energy. Considering a main battery fed from two renewable sources, wind and solar, we scale such a system based on the financial budget, environmental characteristics, and seasonal variations. Interestingly, the source of energy is correlated with the energy expenditure of the UAVs, since strong winds cause UAV hovering to become increasingly energy-hungry. The aim is to maximize the cost efficiency of coverage at a particular location, which is a combinatorial optimization problem for dimensioning of the multivariate energy generation system under non-convex criteria. We have devised a customized algorithm by lowering the processing complexity and reducing the solution space through sampling. Evaluation is done with condensed real-world data on wind, solar energy, and traffic load per unit area, driven by vendor-provided prices. The implementation was tested in four locations, with varying wind or solar intensity. The best results were achieved in locations with mild wind presence and strong solar irradiation, while locations with strong winds and low solar intensity require higher Capital Expenditure (CAPEX) allocation.
△ Less
Submitted 23 November, 2022; v1 submitted 22 November, 2022;
originally announced November 2022.
-
Energy-Efficient Trajectory Design of a Multi-IRS Assisted Portable Access Point
Authors:
Nithin Babu,
Marco Virgili,
Mohammad Al-jarrah,
Xiaoye **g,
Emad Alsusa,
Petar Popovski,
Andrew Forsyth,
Christos Masouros,
Constantinos B. Papadias
Abstract:
In this work, we propose a framework for energy-efficient trajectory design of an unmanned aerial vehicle (UAV)-based portable access point (PAP) deployed to serve a set of ground nodes (GNs). In addition to the PAP and GNs, the system consists of a set of intelligent reflecting surfaces (IRSs) mounted on man-made structures to increase the number of bits transmitted per Joule of energy consumed m…
▽ More
In this work, we propose a framework for energy-efficient trajectory design of an unmanned aerial vehicle (UAV)-based portable access point (PAP) deployed to serve a set of ground nodes (GNs). In addition to the PAP and GNs, the system consists of a set of intelligent reflecting surfaces (IRSs) mounted on man-made structures to increase the number of bits transmitted per Joule of energy consumed measured as the global energy efficiency (GEE). The GEE trajectory for the PAP is designed by considering the UAV propulsion energy consumption and the Peukert effect of the PAP battery, which represents an accurate battery discharge profile as a non-linear function of the UAV power consumption profile. The GEE trajectory design problem is solved in two phases: in the first, a path for the PAP and feasible positions for the IRS modules are found using a multi-tier circle packing method, and the required IRS phase shift values are calculated using an alternate optimization method that considers the interdependence between the amplitude and phase responses of an IRS element; in the second phase, the PAP flying velocity and user scheduling are calculated using a novel multilap trajectory design algorithm. Numerical evaluations show that: neglecting the Peukert effect overestimates the available flight time of the PAP; after a certain threshold, increasing the battery size reduces the available flight time of the PAP; the presence of IRS modules improves the GEE of the system compared to other baseline scenarios; the multi-lap trajectory saves more energy compared to a single-lap trajectory developed using a combination of sequential convex programming and Dinkelbach algorithm.
△ Less
Submitted 1 September, 2022;
originally announced September 2022.
-
StyLitGAN: Prompting StyleGAN to Produce New Illumination Conditions
Authors:
Anand Bhattad,
D. A. Forsyth
Abstract:
We propose a novel method, StyLitGAN, for relighting and resurfacing generated images in the absence of labeled data. Our approach generates images with realistic lighting effects, including cast shadows, soft shadows, inter-reflections, and glossy effects, without the need for paired or CGI data.
StyLitGAN uses an intrinsic image method to decompose an image, followed by a search of the latent…
▽ More
We propose a novel method, StyLitGAN, for relighting and resurfacing generated images in the absence of labeled data. Our approach generates images with realistic lighting effects, including cast shadows, soft shadows, inter-reflections, and glossy effects, without the need for paired or CGI data.
StyLitGAN uses an intrinsic image method to decompose an image, followed by a search of the latent space of a pre-trained StyleGAN to identify a set of directions. By prompting the model to fix one component (e.g., albedo) and vary another (e.g., shading), we generate relighted images by adding the identified directions to the latent style codes. Quantitative metrics of change in albedo and lighting diversity allow us to choose effective directions using a forward selection process. Qualitative evaluation confirms the effectiveness of our method.
△ Less
Submitted 1 May, 2023; v1 submitted 20 May, 2022;
originally announced May 2022.
-
SIRfyN: Single Image Relighting from your Neighbors
Authors:
D. A. Forsyth,
Anand Bhattad,
Pranav Asthana,
Yuanyi Zhong,
Yuxiong Wang
Abstract:
We show how to relight a scene, depicted in a single image, such that (a) the overall shading has changed and (b) the resulting image looks like a natural image of that scene. Applications for such a procedure include generating training data and building authoring environments. Naive methods for doing this fail. One reason is that shading and albedo are quite strongly related; for example, sharp…
▽ More
We show how to relight a scene, depicted in a single image, such that (a) the overall shading has changed and (b) the resulting image looks like a natural image of that scene. Applications for such a procedure include generating training data and building authoring environments. Naive methods for doing this fail. One reason is that shading and albedo are quite strongly related; for example, sharp boundaries in shading tend to appear at depth discontinuities, which usually apparent in albedo. The same scene can be lit in different ways, and established theory shows the different lightings form a cone (the illumination cone). Novel theory shows that one can use similar scenes to estimate the different lightings that apply to a given scene, with bounded expected error. Our method exploits this theory to estimate a representation of the available lighting fields in the form of imputed generators of the illumination cone. Our procedure does not require expensive "inverse graphics" datasets, and sees no ground truth data of any kind.
Qualitative evaluation suggests the method can erase and restore soft indoor shadows, and can "steer" light around a scene. We offer a summary quantitative evaluation of the method with a novel application of the FID. An extension of the FID allows per-generated-image evaluation. Furthermore, we offer qualitative evaluation with a user study, and show that our method produces images that can successfully be used for data augmentation.
△ Less
Submitted 8 December, 2021;
originally announced December 2021.
-
Intrinsic Image Decomposition using Paradigms
Authors:
D. A. Forsyth,
Jason J. Rock
Abstract:
Intrinsic image decomposition is the classical task of map** image to albedo. The WHDR dataset allows methods to be evaluated by comparing predictions to human judgements ("lighter", "same as", "darker"). The best modern intrinsic image methods learn a map from image to albedo using rendered models and human judgements. This is convenient for practical methods, but cannot explain how a visual ag…
▽ More
Intrinsic image decomposition is the classical task of map** image to albedo. The WHDR dataset allows methods to be evaluated by comparing predictions to human judgements ("lighter", "same as", "darker"). The best modern intrinsic image methods learn a map from image to albedo using rendered models and human judgements. This is convenient for practical methods, but cannot explain how a visual agent without geometric, surface and illumination models and a renderer could learn to recover intrinsic images.
This paper describes a method that learns intrinsic image decomposition without seeing WHDR annotations, rendered data, or ground truth data. The method relies on paradigms - fake albedos and fake shading fields - together with a novel smoothing procedure that ensures good behavior at short scales on real images. Long scale error is controlled by averaging. Our method achieves WHDR scores competitive with those of strong recent methods allowed to see training WHDR annotations, rendered data, and ground truth data. Because our method is unsupervised, we can compute estimates of the test/train variance of WHDR scores; these are quite large, and it is unsafe to rely small differences in reported WHDR.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
Cooperating RPN's Improve Few-Shot Object Detection
Authors:
Weilin Zhang,
Yu-Xiong Wang,
David A. Forsyth
Abstract:
Learning to detect an object in an image from very few training examples - few-shot object detection - is challenging, because the classifier that sees proposal boxes has very little training data. A particularly challenging training regime occurs when there are one or two training examples. In this case, if the region proposal network (RPN) misses even one high intersection-over-union (IOU) train…
▽ More
Learning to detect an object in an image from very few training examples - few-shot object detection - is challenging, because the classifier that sees proposal boxes has very little training data. A particularly challenging training regime occurs when there are one or two training examples. In this case, if the region proposal network (RPN) misses even one high intersection-over-union (IOU) training box, the classifier's model of how object appearance varies can be severely impacted. We use multiple distinct yet cooperating RPN's. Our RPN's are trained to be different, but not too different; doing so yields significant performance improvements over state of the art for COCO and PASCAL VOC in the very few-shot setting. This effect appears to be independent of the choice of classifier or dataset.
△ Less
Submitted 19 November, 2020;
originally announced November 2020.
-
Cut-and-Paste Object Insertion by Enabling Deep Image Prior for Reshading
Authors:
Anand Bhattad,
David A. Forsyth
Abstract:
We show how to insert an object from one image to another and get realistic results in the hard case, where the shading of the inserted object clashes with the shading of the scene. Rendering objects using an illumination model of the scene doesn't work, because doing so requires a geometric and material model of the object, which is hard to recover from a single image. In this paper, we introduce…
▽ More
We show how to insert an object from one image to another and get realistic results in the hard case, where the shading of the inserted object clashes with the shading of the scene. Rendering objects using an illumination model of the scene doesn't work, because doing so requires a geometric and material model of the object, which is hard to recover from a single image. In this paper, we introduce a method that corrects shading inconsistencies of the inserted object without requiring a geometric and physical model or an environment map. Our method uses a deep image prior (DIP), trained to produce reshaded renderings of inserted objects via consistent image decomposition inferential losses. The resulting image from DIP aims to have (a) an albedo similar to the cut-and-paste albedo, (b) a similar shading field to that of the target scene, and (c) a shading that is consistent with the cut-and-paste surface normals. The result is a simple procedure that produces convincing shading of the inserted object. We show the efficacy of our method both qualitatively and quantitatively for several objects with complex surface properties and also on a dataset of spherical lampshades for quantitative evaluation. Our method significantly outperforms an Image Harmonization (IH) baseline for all these objects. They also outperform the cut-and-paste and IH baselines in a user study with over 100 users.
△ Less
Submitted 13 September, 2022; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Unrestricted Adversarial Examples via Semantic Manipulation
Authors:
Anand Bhattad,
Min ** Chong,
Kaizhao Liang,
Bo Li,
D. A. Forsyth
Abstract:
Machine learning models, especially deep neural networks (DNNs), have been shown to be vulnerable against adversarial examples which are carefully crafted samples with a small magnitude of the perturbation. Such adversarial perturbations are usually restricted by bounding their $\mathcal{L}_p$ norm such that they are imperceptible, and thus many current defenses can exploit this property to reduce…
▽ More
Machine learning models, especially deep neural networks (DNNs), have been shown to be vulnerable against adversarial examples which are carefully crafted samples with a small magnitude of the perturbation. Such adversarial perturbations are usually restricted by bounding their $\mathcal{L}_p$ norm such that they are imperceptible, and thus many current defenses can exploit this property to reduce their adversarial impact. In this paper, we instead introduce "unrestricted" perturbations that manipulate semantically meaningful image-based visual descriptors - color and texture - in order to generate effective and photorealistic adversarial examples. We show that these semantically aware perturbations are effective against JPEG compression, feature squeezing and adversarially trained model. We also show that the proposed methods can effectively be applied to both image classification and image captioning tasks on complex datasets such as ImageNet and MSCOCO. In addition, we conduct comprehensive user studies to show that our generated semantic adversarial examples are photorealistic to humans despite large magnitude perturbations when compared to other attacks.
△ Less
Submitted 20 March, 2020; v1 submitted 12 April, 2019;
originally announced April 2019.
-
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech
Authors:
Aditya Deshpande,
Jyoti Aneja,
Liwei Wang,
Alexander Schwing,
D. A. Forsyth
Abstract:
Image captioning is an ambiguous problem, with many suitable captions for an image. To address ambiguity, beam search is the de facto method for sampling multiple captions. However, beam search is computationally expensive and known to produce generic captions. To address this concern, some variational auto-encoder (VAE) and generative adversarial net (GAN) based methods have been proposed. Though…
▽ More
Image captioning is an ambiguous problem, with many suitable captions for an image. To address ambiguity, beam search is the de facto method for sampling multiple captions. However, beam search is computationally expensive and known to produce generic captions. To address this concern, some variational auto-encoder (VAE) and generative adversarial net (GAN) based methods have been proposed. Though diverse, GAN and VAE are less accurate. In this paper, we first predict a meaningful summary of the image, then generate the caption based on that summary. We use part-of-speech as summaries, since our summary should drive caption generation. We achieve the trifecta: (1) High accuracy for the diverse captions as evaluated by standard captioning metrics and user studies; (2) Faster computation of diverse captions compared to beam search and diverse beam search; and (3) High diversity as evaluated by counting novel sentences, distinct n-grams and mutual overlap (i.e., mBleu-4) scores.
△ Less
Submitted 10 April, 2019; v1 submitted 31 May, 2018;
originally announced May 2018.
-
Quantitative Evaluation of Style Transfer
Authors:
Mao-Chuang Yeh,
Shuai Tang,
Anand Bhattad,
D. A. Forsyth
Abstract:
Style transfer methods produce a transferred image which is a rendering of a content image in the manner of a style image. There is a rich literature of variant methods. However, evaluation procedures are qualitative, mostly involving user studies. We describe a novel quantitative evaluation procedure. One plots effectiveness (a measure of the extent to which the style was transferred) against coh…
▽ More
Style transfer methods produce a transferred image which is a rendering of a content image in the manner of a style image. There is a rich literature of variant methods. However, evaluation procedures are qualitative, mostly involving user studies. We describe a novel quantitative evaluation procedure. One plots effectiveness (a measure of the extent to which the style was transferred) against coherence (a measure of the extent to which the transferred image decomposes into objects in the same way that the content image does) to obtain an EC plot.
We construct EC plots comparing a number of recent style transfer methods. Most methods control within-layer gram matrices, but we also investigate a method that controls cross-layer gram matrices. These EC plots reveal a number of intriguing properties of recent style transfer methods. The style used has a strong effect on the outcome, for all methods. Using large style weights does not necessarily improve effectiveness, and can produce worse results. Cross-layer gram matrices easily beat all other methods, but some styles remain difficult for all methods. Ensemble methods show real promise. It is likely that, for current methods, each style requires a different choice of weights to obtain the best results, so that automated weight setting methods are desirable. Finally, we show evidence comparing our EC evaluations to human evaluations.
△ Less
Submitted 31 March, 2018;
originally announced April 2018.