Search | arXiv e-print repository

Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis

Authors: Yifan Wang, Aleksander Holynski, Brian L. Curless, Steven M. Seitz

Abstract: We present Infinite Texture, a method for generating arbitrarily large texture images from a text prompt. Our approach fine-tunes a diffusion model on a single texture, and learns to embed that statistical distribution in the output domain of the model. We seed this fine-tuning process with a sample texture patch, which can be optionally generated from a text-to-image model like DALL-E 2. At gener… ▽ More We present Infinite Texture, a method for generating arbitrarily large texture images from a text prompt. Our approach fine-tunes a diffusion model on a single texture, and learns to embed that statistical distribution in the output domain of the model. We seed this fine-tuning process with a sample texture patch, which can be optionally generated from a text-to-image model like DALL-E 2. At generation time, our fine-tuned diffusion model is used through a score aggregation strategy to generate output texture images of arbitrary resolution on a single GPU. We compare synthesized textures from our method to existing work in patch-based and deep learning texture synthesis methods. We also showcase two applications of our generated textures in 3D rendering and texture transfer. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2404.17104 [pdf, other]

Don't Look at the Camera: Achieving Perceived Eye Contact

Authors: Alice Gao, Samyukta Jayakumar, Marcello Maniglia, Brian Curless, Ira Kemelmacher-Shlizerman, Aaron R. Seitz, Steven M. Seitz

Abstract: We consider the question of how to best achieve the perception of eye contact when a person is captured by camera and then rendered on a 2D display. For single subjects photographed by a camera, conventional wisdom tells us that looking directly into the camera achieves eye contact. Through empirical user studies, we show that it is instead preferable to {\em look just below the camera lens}. We q… ▽ More We consider the question of how to best achieve the perception of eye contact when a person is captured by camera and then rendered on a 2D display. For single subjects photographed by a camera, conventional wisdom tells us that looking directly into the camera achieves eye contact. Through empirical user studies, we show that it is instead preferable to {\em look just below the camera lens}. We quantitatively assess where subjects should direct their gaze relative to a camera lens to optimize the perception that they are making eye contact. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2402.09526 [pdf, other]

C3NN: Cosmological Correlator Convolutional Neural Network -- an interpretable machine learning tool for cosmological analyses

Authors: Zhengyangguang Gong, Anik Halder, Annabelle Bohrdt, Stella Seitz, David Gebauer

Abstract: Modern cosmological research in large scale structure has witnessed an increasing number of applications of machine learning methods. Among them, Convolutional Neural Networks (CNNs) have received substantial attention due to their outstanding performance in image classification, cosmological parameter inference and various other tasks. However, many models which make use of CNNs are criticized as… ▽ More Modern cosmological research in large scale structure has witnessed an increasing number of applications of machine learning methods. Among them, Convolutional Neural Networks (CNNs) have received substantial attention due to their outstanding performance in image classification, cosmological parameter inference and various other tasks. However, many models which make use of CNNs are criticized as "black boxes" due to the difficulties in relating their outputs intuitively and quantitatively to the cosmological fields under investigation. To overcome this challenge, we present the Cosmological Correlator Convolutional Neural Network (C3NN) -- a fusion of CNN architecture with the framework of cosmological N-point correlation functions (NPCFs). We demonstrate that the output of this model can be expressed explicitly in terms of the analytically tractable NPCFs. Together with other auxiliary algorithms, we are able to open the "black box" by quantitatively ranking different orders of the interpretable convolution outputs based on their contribution to classification tasks. As a proof of concept, we demonstrate this by applying our framework to a series of binary classification tasks using Gaussian and Log-normal random fields and relating its outputs to the analytical NPCFs describing the two fields. Furthermore, we exhibit the model's ability to distinguish different dark energy scenarios ($w_0=-0.95$ and $-1.05$) using N-body simulated weak lensing convergence maps and discuss the physical implications coming from their interpretability. With these tests, we show that C3NN combines advanced aspects of machine learning architectures with the framework of cosmological NPCFs, thereby making it an exciting tool with the potential to extract physical insights in a robust and explainable way from observational data. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 19 pages, 8 figures, 5 tables; Comments are welcome!

arXiv:2312.09311 [pdf, other]

Time Delay Cosmography: Analysis of Quadruply Lensed QSO SDSSJ1433 from Wendelstein Observatory

Authors: G. Queirolo, S. Seitz, A. Riffeser, M. Kluge, R. Bender, C. Gössl, U. Hopp, C. Ries, M. Schmidt, R. Zöller

Abstract: The goal of this work is to obtain a Hubble constant estimate through the study of the quadruply lensed, variable QSO SDSSJ1433+6007. To achieve this we combine multi-filter, archival $\textit{HST}$ data for lens modelling and a dedicated time delay monitoring campaign with the 2.1m Fraunhofer telescope at the $\textit{Wendelstein Observatory}$. The lens modelling is carried out with the public… ▽ More The goal of this work is to obtain a Hubble constant estimate through the study of the quadruply lensed, variable QSO SDSSJ1433+6007. To achieve this we combine multi-filter, archival $\textit{HST}$ data for lens modelling and a dedicated time delay monitoring campaign with the 2.1m Fraunhofer telescope at the $\textit{Wendelstein Observatory}$. The lens modelling is carried out with the public $\texttt{lenstronomy}$ Python package for each of the filters individually. Through this approach, we find that the data in one of the $\textit{HST}$ filters (F160W) contain a light contaminant, that would, if remained undetected, have severely biased the lensing potentials and thus our cosmological inference. After rejecting these data we obtain a combined posterior for the Fermat potential differences from the lens modelling in the remaining filters (F475X, F814W, F105W and F140W) with a precision of $\sim6\%$. The analysis of the $\textit{g'}$-band Wendelstein light curve data is carried out with a free-knot spline fitting method implemented in the public Python $\texttt{PyCS3}$ tools. The precision of the time delays between the QSO images has a range between 7.5 and 9.8$\%$ depending on the brightness of the images and their time delay. We then combine the posteriors for the Fermat potential differences and time delays. Assuming a flat $Λ$CDM cosmology, we infer a Hubble parameter of $H_0=76.6^{+7.7}_{-7.0}\frac{\mathrm{km}}{\mathrm{Mpc\;s}}$, reaching $9.6\%$ uncertainty for a single system. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 27 pages, 29 figures, to be submitted to MNRAS

arXiv:2312.02149 [pdf, other]

Generative Powers of Ten

Authors: Xiaojuan Wang, Janne Kontkanen, Brian Curless, Steve Seitz, Ira Kemelmacher, Ben Mildenhall, Pratul Srinivasan, Dor Verbin, Aleksander Holynski

Abstract: We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different… ▽ More We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different scales while preserving the integrity of each individual sampling process. Since each generated scale is guided by a different text prompt, our method enables deeper levels of zoom than traditional super-resolution methods that may struggle to create new contextual structure at vastly different scales. We compare our method qualitatively with alternative techniques in image super-resolution and outpainting, and show that our method is most effective at generating consistent multi-scale content. △ Less

Submitted 21 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: Project page: https://powers-of-10.github.io/

arXiv:2311.12481 [pdf]

Interpretability is in the eye of the beholder: Human versus artificial classification of image segments generated by humans versus XAI

Authors: Romy Müller, Marius Thoß, Julian Ullrich, Steffen Seitz, Carsten Knoll

Abstract: The evaluation of explainable artificial intelligence is challenging, because automated and human-centred metrics of explanation quality may diverge. To clarify their relationship, we investigated whether human and artificial image classification will benefit from the same visual explanations. In three experiments, we analysed human reaction times, errors, and subjective ratings while participants… ▽ More The evaluation of explainable artificial intelligence is challenging, because automated and human-centred metrics of explanation quality may diverge. To clarify their relationship, we investigated whether human and artificial image classification will benefit from the same visual explanations. In three experiments, we analysed human reaction times, errors, and subjective ratings while participants classified image segments. These segments either reflected human attention (eye movements, manual selections) or the outputs of two attribution methods explaining a ResNet (Grad-CAM, XRAI). We also had this model classify the same segments. Humans and the model largely agreed on the interpretability of attribution methods: Grad-CAM was easily interpretable for indoor scenes and landscapes, but not for objects, while the reverse pattern was observed for XRAI. Conversely, human and model performance diverged for human-generated segments. Our results caution against general statements about interpretability, as it varies with the explanation method, the explained images, and the agent interpreting them. △ Less

Submitted 12 February, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.03560 [pdf, other]

HRTF Estimation in the Wild

Authors: Vivek Jayaram, Ira Kemelmacher-Shlizerman, Steven M. Seitz

Abstract: Head Related Transfer Functions (HRTFs) play a crucial role in creating immersive spatial audio experiences. However, HRTFs differ significantly from person to person, and traditional methods for estimating personalized HRTFs are expensive, time-consuming, and require specialized equipment. We imagine a world where your personalized HRTF can be determined by capturing data through earbuds in every… ▽ More Head Related Transfer Functions (HRTFs) play a crucial role in creating immersive spatial audio experiences. However, HRTFs differ significantly from person to person, and traditional methods for estimating personalized HRTFs are expensive, time-consuming, and require specialized equipment. We imagine a world where your personalized HRTF can be determined by capturing data through earbuds in everyday environments. In this paper, we propose a novel approach for deriving personalized HRTFs that only relies on in-the-wild binaural recordings and head tracking data. By analyzing how sounds change as the user rotates their head through different environments with different noise sources, we can accurately estimate their personalized HRTF. Our results show that our predicted HRTFs closely match ground-truth HRTFs measured in an anechoic chamber. Furthermore, listening studies demonstrate that our personalized HRTFs significantly improve sound localization and reduce front-back confusion in virtual environments. Our approach offers an efficient and accessible method for deriving personalized HRTFs and has the potential to greatly improve spatial audio experiences. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 9 Pages. Presented at UIST '23

arXiv:2310.08534 [pdf, other]

doi 10.1145/3610548.3618230

Animating Street View

Authors: Mengyi Shan, Brian Curless, Ira Kemelmacher-Shlizerman, Steve Seitz

Abstract: We present a system that automatically brings street view imagery to life by populating it with naturally behaving, animated pedestrians and vehicles. Our approach is to remove existing people and vehicles from the input image, insert moving objects with proper scale, angle, motion, and appearance, plan paths and traffic behavior, as well as render the scene with plausible occlusion and shadowing… ▽ More We present a system that automatically brings street view imagery to life by populating it with naturally behaving, animated pedestrians and vehicles. Our approach is to remove existing people and vehicles from the input image, insert moving objects with proper scale, angle, motion, and appearance, plan paths and traffic behavior, as well as render the scene with plausible occlusion and shadowing effects. The system achieves these by reconstructing the still image street scene, simulating crowd behavior, and rendering with consistent lighting, visibility, occlusions, and shadows. We demonstrate results on a diverse range of street scenes including regular still images and panoramas. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: SIGGRAPH Asia 2023 Conference Track

arXiv:2308.14740 [pdf, other]

Total Selfie: Generating Full-Body Selfies

Authors: Bowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman, Steven M. Seitz

Abstract: We present a method to generate full-body selfies from photographs originally taken at arms length. Because self-captured photos are typically taken close up, they have limited field of view and exaggerated perspective that distorts facial shapes. We instead seek to generate the photo some one else would take of you from a few feet away. Our approach takes as input four selfies of your face and bo… ▽ More We present a method to generate full-body selfies from photographs originally taken at arms length. Because self-captured photos are typically taken close up, they have limited field of view and exaggerated perspective that distorts facial shapes. We instead seek to generate the photo some one else would take of you from a few feet away. Our approach takes as input four selfies of your face and body, a background image, and generates a full-body selfie in a desired target pose. We introduce a novel diffusion-based approach to combine all of this information into high-quality, well-composed photos of you with the desired pose and background. △ Less

Submitted 3 April, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: Project page: https://homes.cs.washington.edu/~boweiche/project_page/totalselfie/

arXiv:2307.13345 [pdf]

Do humans and Convolutional Neural Networks attend to similar areas during scene classification: Effects of task and image type

Authors: Romy Müller, Marcel Dürschmidt, Julian Ullrich, Carsten Knoll, Sascha Weber, Steffen Seitz

Abstract: Deep Learning models like Convolutional Neural Networks (CNN) are powerful image classifiers, but what factors determine whether they attend to similar image areas as humans do? While previous studies have focused on technological factors, little is known about the role of factors that affect human attention. In the present study, we investigated how the tasks used to elicit human attention maps i… ▽ More Deep Learning models like Convolutional Neural Networks (CNN) are powerful image classifiers, but what factors determine whether they attend to similar image areas as humans do? While previous studies have focused on technological factors, little is known about the role of factors that affect human attention. In the present study, we investigated how the tasks used to elicit human attention maps interact with image characteristics in modulating the similarity between humans and CNN. We varied the intentionality of human tasks, ranging from spontaneous gaze during categorization over intentional gaze-pointing up to manual area selection. Moreover, we varied the type of image to be categorized, using either singular, salient objects, indoor scenes consisting of object arrangements, or landscapes without distinct objects defining the category. The human attention maps generated in this way were compared to the CNN attention maps revealed by explainable artificial intelligence (Grad-CAM). The influence of human tasks strongly depended on image type: For objects, human manual selection produced maps that were most similar to CNN, while the specific eye movement task has little impact. For indoor scenes, spontaneous gaze produced the least similarity, while for landscapes, similarity was equally low across all human tasks. To better understand these results, we also compared the different human attention maps to each other. Our results highlight the importance of taking human factors into account when comparing the attention of humans and CNN. △ Less

Submitted 15 October, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

arXiv:2307.08466 [pdf, other]

Generalizable Classification of UHF Partial Discharge Signals in Gas-Insulated HVDC Systems Using Neural Networks

Authors: Steffen Seitz, Thomas Götz, Christopher Lindenberg, Ronald Tetzlaff, Stephan Schlegel

Abstract: Undetected partial discharges (PDs) are a safety critical issue in high voltage (HV) gas insulated systems (GIS). While the diagnosis of PDs under AC voltage is well-established, the analysis of PDs under DC voltage remains an active research field. A key focus of these investigations is the classification of different PD sources to enable subsequent sophisticated analysis. In this paper, we pro… ▽ More Undetected partial discharges (PDs) are a safety critical issue in high voltage (HV) gas insulated systems (GIS). While the diagnosis of PDs under AC voltage is well-established, the analysis of PDs under DC voltage remains an active research field. A key focus of these investigations is the classification of different PD sources to enable subsequent sophisticated analysis. In this paper, we propose and analyze a neural network-based approach for classifying PD signals caused by metallic protrusions and conductive particles on the insulator of HVDC GIS, without relying on pulse sequence analysis features. In contrast to previous approaches, our proposed model can discriminate the studied PD signals obtained at negative and positive potentials, while also generalizing to unseen operating voltage multiples. Additionally, we compare the performance of time- and frequency-domain input signals and explore the impact of different normalization schemes to mitigate the influence of free-space path loss between the sensor and defect location. △ Less

Submitted 18 July, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 8 pages, submitted to IEEE Transactions on Power Delivery

arXiv:2305.17132 [pdf, other]

Beyond 3$\times$2-point cosmology: the integrated shear and galaxy 3-point correlation functions

Authors: Anik Halder, Zhengyangguang Gong, Alexandre Barreira, Oliver Friedrich, Stella Seitz, Daniel Gruen

Abstract: We present the integrated 3-point correlation functions (3PCF) involving both the cosmic shear and the galaxy density fields. These are a set of higher-order statistics that describe the modulation of local 2-point correlation functions (2PCF) by large-scale features in the fields, and which are easy to measure from galaxy imaging surveys. Based on previous works on the shear-only integrated 3PCF,… ▽ More We present the integrated 3-point correlation functions (3PCF) involving both the cosmic shear and the galaxy density fields. These are a set of higher-order statistics that describe the modulation of local 2-point correlation functions (2PCF) by large-scale features in the fields, and which are easy to measure from galaxy imaging surveys. Based on previous works on the shear-only integrated 3PCF, we develop the theoretical framework for modelling 5 new statistics involving the galaxy field and its cross-correlations with cosmic shear. Using realistic galaxy and cosmic shear mocks from simulations, we determine the regime of validity of our models based on leading-order standard perturbation theory with an MCMC analysis that recovers unbiased constraints of the amplitude of fluctuations parameter $A_s$ and the linear and quadratic galaxy bias parameters $b_1$ and $b_2$. Using Fisher matrix forecasts for a DES-Y3-like survey, relative to baseline analyses with conventional 3$\times$2PCFs, we find that the addition of the shear-only integrated 3PCF can improve cosmological parameter constraints by $20-40\%$. The subsequent addition of the new statistics introduced in this paper can lead to further improvements of $10-20\%$, even when utilizing only conservatively large scales where the tree-level models are valid. Our results motivate future work on the galaxy and shear integrated 3PCFs, which offer a practical way to extend standard analyses based on 3$\times$2PCFs to systematically probe the non-Gaussian information content of cosmic density fields. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 19 pages, 8 figures + appendix. Comments are welcome!

arXiv:2304.01187 [pdf, other]

doi 10.1088/1475-7516/2023/07/040

Cosmology from the integrated shear 3-point correlation function: simulated likelihood analyses with machine-learning emulators

Authors: Zhengyangguang Gong, Anik Halder, Alexandre Barreira, Stella Seitz, Oliver Friedrich

Abstract: The integrated shear 3-point correlation function $ζ_{\pm}$ measures the correlation between the local shear 2-point function $ξ_{\pm}$ and the 1-point shear aperture mass in patches of the sky. Unlike other higher-order statistics, $ζ_{\pm}$ can be efficiently measured from cosmic shear data, and it admits accurate theory predictions on a wide range of scales as a function of cosmological and bar… ▽ More The integrated shear 3-point correlation function $ζ_{\pm}$ measures the correlation between the local shear 2-point function $ξ_{\pm}$ and the 1-point shear aperture mass in patches of the sky. Unlike other higher-order statistics, $ζ_{\pm}$ can be efficiently measured from cosmic shear data, and it admits accurate theory predictions on a wide range of scales as a function of cosmological and baryonic feedback parameters. Here, we develop and test a likelihood analysis pipeline for cosmological constraints using $ζ_{\pm}$. We incorporate treatment of systematic effects from photometric redshift uncertainties, shear calibration bias and galaxy intrinsic alignments. We also develop an accurate neural-network emulator for fast theory predictions in MCMC parameter inference analyses. We test our pipeline using realistic cosmic shear maps based on $N$-body simulations with a DES Y3-like footprint, mask and source tomographic bins, finding unbiased parameter constraints. Relative to $ξ_{\pm}$-only, adding $ζ_{\pm}$ can lead to $\approx 10-25\%$ improvements on the constraints of parameters like $A_s$ (or $σ_8$) and $w_0$. We find no evidence in $ξ_{\pm} + ζ_{\pm}$ constraints of a significant mitigation of the impact of systematics. We also investigate the impact of the size of the apertures where $ζ_{\pm}$ is measured, and of the strategy to estimate the covariance matrix ($N$-body vs. lognormal). Our analysis solidifies the strong potential of the $ζ_{\pm}$ statistic and puts forward a pipeline that can be readily used to improve cosmological constraints using real cosmic shear data. △ Less

Submitted 14 July, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: 21 pages, 11 figures, 3 tables. Comments welcome

Journal ref: JCAP07(2023)040

arXiv:2212.10521 [pdf]

Single virus fingerprinting by widefield interferometric defocus-enhanced mid-infrared photothermal microscopy

Authors: Qing Xia, Zhongyue Guo, Haonan Zong, Scott Seitz, Celalettin Yurdakul, M. Selim Unlu, Le Wang, John H. Connor, Ji-Xin Cheng

Abstract: Clinical identification and fundamental study of viruses rely on the detection of viral proteins or viral nucleic acids. Yet, amplification-based and antigen-based methods are not able to provide precise compositional information of individual virions due to small particle size and low-abundance chemical contents (e.g., ~ 5000 proteins in a vesicular stomatitis virus). Here, we report a widefield… ▽ More Clinical identification and fundamental study of viruses rely on the detection of viral proteins or viral nucleic acids. Yet, amplification-based and antigen-based methods are not able to provide precise compositional information of individual virions due to small particle size and low-abundance chemical contents (e.g., ~ 5000 proteins in a vesicular stomatitis virus). Here, we report a widefield interferometric defocus-enhanced mid-infrared photothermal (WIDE-MIP) microscope for high-throughput fingerprinting of single viruses. With the identification of feature absorption peaks, WIDE-MIP reveals the contents of viral proteins and nucleic acids in single DNA vaccinia viruses and RNA vesicular stomatitis viruses. Different nucleic acids signatures of thymine and uracil residue vibrations are obtained to differentiate DNA and RNA viruses. WIDE-MIP imaging further reveals an enriched \b{eta} sheet components in DNA varicella-zoster virus proteins. Together, these advances open a new avenue for compositional analysis of viral vectors and elucidating protein function in an assembled virion. △ Less

Submitted 4 August, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

arXiv:2206.13611 [pdf, other]

doi 10.1145/3498361.3538933

ClearBuds: Wireless Binaural Earbuds for Learning-Based Speech Enhancement

Authors: Ishan Chatterjee, Maruchi Kim, Vivek Jayaram, Shyamnath Gollakota, Ira Kemelmacher-Shlizerman, Shwetak Patel, Steven M. Seitz

Abstract: We present ClearBuds, the first hardware and software system that utilizes a neural network to enhance speech streamed from two wireless earbuds. Real-time speech enhancement for wireless earbuds requires high-quality sound separation and background cancellation, operating in real-time and on a mobile phone. Clear-Buds bridges state-of-the-art deep learning for blind audio source separation and in… ▽ More We present ClearBuds, the first hardware and software system that utilizes a neural network to enhance speech streamed from two wireless earbuds. Real-time speech enhancement for wireless earbuds requires high-quality sound separation and background cancellation, operating in real-time and on a mobile phone. Clear-Buds bridges state-of-the-art deep learning for blind audio source separation and in-ear mobile systems by making two key technical contributions: 1) a new wireless earbud design capable of operating as a synchronized, binaural microphone array, and 2) a lightweight dual-channel speech enhancement neural network that runs on a mobile device. Our neural network has a novel cascaded architecture that combines a time-domain conventional neural network with a spectrogram-based frequency masking neural network to reduce the artifacts in the audio output. Results show that our wireless earbuds achieve a synchronization error less than 64 microseconds and our network has a runtime of 21.4 milliseconds on an accompanying mobile phone. In-the-wild evaluation with eight users in previously unseen indoor and outdoor multipath scenarios demonstrates that our neural network generalizes to learn both spatial and acoustic cues to perform noise suppression and background speech removal. In a user-study with 37 participants who spent over 15.4 hours rating 1041 audio samples collected in-the-wild, our system achieves improved mean opinion score and background noise suppression. Project page with demos: https://clearbuds.cs.washington.edu △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: 12 pages, Published in Mobisys 2022

arXiv:2205.12797 [pdf, other]

Gradient-based explanations for Gaussian Process regression and classification models

Authors: Sarem Seitz

Abstract: Gaussian Processes (GPs) have proven themselves as a reliable and effective method in probabilistic Machine Learning. Thanks to recent and current advances, modeling complex data with GPs is becoming more and more feasible. Thus, these types of models are, nowadays, an interesting alternative to Neural and Deep Learning methods, which are arguably the current state-of-the-art in Machine Learning.… ▽ More Gaussian Processes (GPs) have proven themselves as a reliable and effective method in probabilistic Machine Learning. Thanks to recent and current advances, modeling complex data with GPs is becoming more and more feasible. Thus, these types of models are, nowadays, an interesting alternative to Neural and Deep Learning methods, which are arguably the current state-of-the-art in Machine Learning. For the latter, we see an increasing interest in so-called explainable approaches - in essence methods that aim to make a Machine Learning model's decision process transparent to humans. Such methods are particularly needed when illogical or biased reasoning can lead to actual disadvantageous consequences for humans. Ideally, explainable Machine Learning should help detect such flaws in a model and aid a subsequent debugging process. One active line of research in Machine Learning explainability are gradient-based methods, which have been successfully applied to complex neural networks. Given that GPs are closed under differentiation, gradient-based explainability for GPs appears as a promising field of research. This paper is primarily focused on explaining GP classifiers via gradients where, contrary to GP regression, derivative GPs are not straightforward to obtain. △ Less

Submitted 25 May, 2022; originally announced May 2022.

arXiv:2204.10575 [pdf, other]

A piece-wise constant approximation for non-conjugate Gaussian Process models

Authors: Sarem Seitz

Abstract: Gaussian Processes (GPs) are a versatile and popular method in Bayesian Machine Learning. A common modification are Sparse Variational Gaussian Processes (SVGPs) which are well suited to deal with large datasets. While GPs allow to elegantly deal with Gaussian-distributed target variables in closed form, their applicability can be extended to non-Gaussian data as well. These extensions are usually… ▽ More Gaussian Processes (GPs) are a versatile and popular method in Bayesian Machine Learning. A common modification are Sparse Variational Gaussian Processes (SVGPs) which are well suited to deal with large datasets. While GPs allow to elegantly deal with Gaussian-distributed target variables in closed form, their applicability can be extended to non-Gaussian data as well. These extensions are usually impossible to treat in closed form and hence require approximate solutions. This paper proposes to approximate the inverse-link function, which is necessary when working with non-Gaussian likelihoods, by a piece-wise constant function. It will be shown that this yields a closed form solution for the corresponding SVGP lower bound. In addition, it is demonstrated how the piece-wise constant function itself can be optimized, resulting in an inverse-link function that can be learnt from the data at hand. △ Less

Submitted 22 April, 2022; originally announced April 2022.

arXiv:2111.07986 [pdf, other]

Nonprehensile Riemannian Motion Predictive Control

Authors: Hamid Izadinia, Byron Boots, Steven M. Seitz

Abstract: Nonprehensile manipulation involves long horizon underactuated object interactions and physical contact with different objects that can inherently introduce a high degree of uncertainty. In this work, we introduce a novel Real-to-Sim reward analysis technique, called Riemannian Motion Predictive Control (RMPC), to reliably imagine and predict the outcome of taking possible actions for a real robot… ▽ More Nonprehensile manipulation involves long horizon underactuated object interactions and physical contact with different objects that can inherently introduce a high degree of uncertainty. In this work, we introduce a novel Real-to-Sim reward analysis technique, called Riemannian Motion Predictive Control (RMPC), to reliably imagine and predict the outcome of taking possible actions for a real robotic platform. Our proposed RMPC benefits from Riemannian motion policy and second order dynamic model to compute the acceleration command and control the robot at every location on the surface. Our approach creates a 3D object-level recomposed model of the real scene where we can simulate the effect of different trajectories. We produce a closed-loop controller to reactively push objects in a continuous action space. We evaluate the performance of our RMPC approach by conducting experiments on a real robot platform as well as simulation and compare against several baselines. We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: To appear at International Symposium on Experimental Robotics (ISER)

arXiv:2109.03708 [pdf, other]

Self-explaining variational posterior distributions for Gaussian Process models

Authors: Sarem Seitz

Abstract: Bayesian methods have become a popular way to incorporate prior knowledge and a notion of uncertainty into machine learning models. At the same time, the complexity of modern machine learning makes it challenging to comprehend a model's reasoning process, let alone express specific prior assumptions in a rigorous manner. While primarily interested in the former issue, recent developments intranspa… ▽ More Bayesian methods have become a popular way to incorporate prior knowledge and a notion of uncertainty into machine learning models. At the same time, the complexity of modern machine learning makes it challenging to comprehend a model's reasoning process, let alone express specific prior assumptions in a rigorous manner. While primarily interested in the former issue, recent developments intransparent machine learning could also broaden the range of prior information that we can provide to complex Bayesian models. Inspired by the idea of self-explaining models, we introduce a corresponding concept for variational GaussianProcesses. On the one hand, our contribution improves transparency for these types of models. More importantly though, our proposed self-explaining variational posterior distribution allows to incorporate both general prior knowledge about a target function as a whole and prior knowledge about the contribution of individual features. △ Less

Submitted 8 September, 2021; originally announced September 2021.

arXiv:2109.03305 [pdf, other]

doi 10.1051/0004-6361/202142168

CLASH-VLT: Abell~S1063. Cluster assembly history and spectroscopic catalogue

Authors: A. Mercurio, P. Rosati, A. Biviano, M. Annunziatella, M. Girardi, B. Sartoris, M. Nonino, M. Brescia, G. Riccio, C. Grillo, I. Balestra, G. B. Caminha, G. De Lucia, R. Gobat, S. Seitz, P. Tozzi, M. Scodeggio, E. Vanzella, G. Angora, P. Bergamini, S. Borgani, R. Demarco, M. Meneghetti, V. Strazzullo, L. Tortorelli , et al. (9 additional authors not shown)

Abstract: Using the CLASH-VLT survey, we assembled an unprecedented sample of 1234 spectroscopically confirmed members in Abell~S1063, finding a dynamically complex structure at z_cl=0.3457 with a velocity dispersion σ_v=1380 -32 +26 km s^-1. We investigate cluster environmental and dynamical effects by analysing the projected phase-space diagram and the orbits as a function of galaxy spectral properties. W… ▽ More Using the CLASH-VLT survey, we assembled an unprecedented sample of 1234 spectroscopically confirmed members in Abell~S1063, finding a dynamically complex structure at z_cl=0.3457 with a velocity dispersion σ_v=1380 -32 +26 km s^-1. We investigate cluster environmental and dynamical effects by analysing the projected phase-space diagram and the orbits as a function of galaxy spectral properties. We classify cluster galaxies according to the presence and strength of the [OII] emission line, the strength of the H$δ$ absorption line, and colours. We investigate the relationship between the spectral classes of galaxies and their position in the projected phase-space diagram. We analyse separately red and blue galaxy orbits. By correlating the observed positions and velocities with the projected phase-space constructed from simulations, we constrain the accretion redshift of galaxies with different spectral types. Passive galaxies are mainly located in the virialised region, while emission-line galaxies are outside r_200, and are accreted later into the cluster. Emission-lines and post-starbursts show an asymmetric distribution in projected phase-space within r_200, with the first being prominent at Delta_v/sigma <~-1.5$, and the second at Delta_v/ sigma >~ 1.5, suggesting that backsplash galaxies lie at large positive velocities. We find that low-mass passive galaxies are accreted in the cluster before the high-mass ones. This suggests that we observe as passives only the low-mass galaxies accreted early in the cluster as blue galaxies, that had the time to quench their star formation. We also find that red galaxies move on more radial orbits than blue galaxies. This can be explained if infalling galaxies can remain blue moving on tangential orbits. △ Less

Submitted 3 November, 2021; v1 submitted 7 September, 2021; originally announced September 2021.

Comments: 24 pages, 28 figures, 3 tables, Accepted for publication in Astronomy and Astrophysics

Journal ref: A&A 656, A147 (2021)

arXiv:2106.13228 [pdf, other]

HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields

Authors: Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin-Brualla, Steven M. Seitz

Abstract: Neural Radiance Fields (NeRF) are able to reconstruct scenes with unprecedented fidelity, and various recent works have extended NeRF to handle dynamic scenes. A common approach to reconstruct such non-rigid scenes is through the use of a learned deformation field map** from coordinates in each input image into a canonical template coordinate space. However, these deformation-based approaches st… ▽ More Neural Radiance Fields (NeRF) are able to reconstruct scenes with unprecedented fidelity, and various recent works have extended NeRF to handle dynamic scenes. A common approach to reconstruct such non-rigid scenes is through the use of a learned deformation field map** from coordinates in each input image into a canonical template coordinate space. However, these deformation-based approaches struggle to model changes in topology, as topological changes require a discontinuity in the deformation field, but these deformation fields are necessarily continuous. We address this limitation by lifting NeRFs into a higher dimensional space, and by representing the 5D radiance field corresponding to each individual input image as a slice through this "hyper-space". Our method is inspired by level set methods, which model the evolution of surfaces as slices through a higher dimensional surface. We evaluate our method on two tasks: (i) interpolating smoothly between "moments", i.e., configurations of the scene, seen in the input images while maintaining visual plausibility, and (ii) novel-view synthesis at fixed moments. We show that our method, which we dub HyperNeRF, outperforms existing methods on both tasks. Compared to Nerfies, HyperNeRF reduces average error rates by 4.1% for interpolation and 8.6% for novel-view synthesis, as measured by LPIPS. Additional videos, results, and visualizations are available at https://hypernerf.github.io. △ Less

Submitted 10 September, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

Comments: SIGGRAPH Asia 2021, Project page: https://hypernerf.github.io/

arXiv:2106.08438 [pdf, other]

doi 10.1093/mnras/stab3155

Dark Energy Survey Year 3 results: Galaxy-halo connection from galaxy-galaxy lensing

Authors: G. Zacharegkas, C. Chang, J. Prat, S. Pandey, I. Ferrero, J. Blazek, B. Jain, M. Crocce, J. DeRose, A. Palmese, S. Seitz, E. Sheldon, W. G. Hartley, R. H. Wechsler, S. Dodelson, P. Fosalba, E. Krause, Y. Park, C. Sánchez, A. Alarcon, A. Amon, K. Bechtol, M. R. Becker, G. M. Bernstein, A. Campos , et al. (92 additional authors not shown)

Abstract: Galaxy-galaxy lensing is a powerful probe of the connection between galaxies and their host dark matter halos, which is important both for galaxy evolution and cosmology. We extend the measurement and modeling of the galaxy-galaxy lensing signal in the recent Dark Energy Survey Year 3 cosmology analysis to the highly nonlinear scales ($\sim 100$ kpc). This extension enables us to study the galaxy-… ▽ More Galaxy-galaxy lensing is a powerful probe of the connection between galaxies and their host dark matter halos, which is important both for galaxy evolution and cosmology. We extend the measurement and modeling of the galaxy-galaxy lensing signal in the recent Dark Energy Survey Year 3 cosmology analysis to the highly nonlinear scales ($\sim 100$ kpc). This extension enables us to study the galaxy-halo connection via a Halo Occupation Distribution (HOD) framework for the two lens samples used in the cosmology analysis: a luminous red galaxy sample (redMaGiC) and a magnitude-limited galaxy sample (MagLim). We find that redMaGiC (MagLim) galaxies typically live in dark matter halos of mass $\log_{10}(M_{h}/M_{\odot}) \approx 13.7$ which is roughly constant over redshift ($13.3-13.5$ depending on redshift). We constrain these masses to $\sim 15\%$, approximately $1.5$ times improvement over previous work. We also constrain the linear galaxy bias more than 5 times better than what is inferred by the cosmological scales only. We find the satellite fraction for redMaGiC (MagLim) to be $\sim 0.1-0.2$ ($0.1-0.3$) with no clear trend in redshift. Our constraints on these halo properties are broadly consistent with other available estimates from previous work, large-scale constraints and simulations. The framework built in this paper will be used for future HOD studies with other galaxy samples and extensions for cosmological analyses. △ Less

Submitted 2 March, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

Comments: 32 pages, 21 figures, accepted for publication in MNRAS

Report number: FERMILAB-PUB-21-264-AE

arXiv:2105.08051 [pdf, other]

A Light Stage on Every Desk

Authors: Soumyadip Sengupta, Brian Curless, Ira Kemelmacher-Shlizerman, Steve Seitz

Abstract: Every time you sit in front of a TV or monitor, your face is actively illuminated by time-varying patterns of light. This paper proposes to use this time-varying illumination for synthetic relighting of your face with any new illumination condition. In doing so, we take inspiration from the light stage work of Debevec et al., who first demonstrated the ability to relight people captured in a contr… ▽ More Every time you sit in front of a TV or monitor, your face is actively illuminated by time-varying patterns of light. This paper proposes to use this time-varying illumination for synthetic relighting of your face with any new illumination condition. In doing so, we take inspiration from the light stage work of Debevec et al., who first demonstrated the ability to relight people captured in a controlled lighting environment. Whereas existing light stages require expensive, room-scale spherical capture gantries and exist in only a few labs in the world, we demonstrate how to acquire useful data from a normal TV or desktop monitor. Instead of subjecting the user to uncomfortable rapidly flashing light patterns, we operate on images of the user watching a YouTube video or other standard content. We train a deep network on images plus monitor patterns of a given user and learn to predict images of that user under any target illumination (monitor pattern). Experimental evaluation shows that our method produces realistic relighting results. Video results are available at http://grail.cs.washington.edu/projects/Light_Stage_on_Every_Desk/. △ Less

Submitted 11 November, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: Updated citations from v1

arXiv:2104.09185 [pdf, other]

Mixtures of Gaussian Processes for regression under multiple prior distributions

Authors: Sarem Seitz

Abstract: When constructing a Bayesian Machine Learning model, we might be faced with multiple different prior distributions and thus are required to properly consider them in a sensible manner in our model. While this situation is reasonably well explored for classical Bayesian Statistics, it appears useful to develop a corresponding method for complex Machine Learning problems. Given their underlying Baye… ▽ More When constructing a Bayesian Machine Learning model, we might be faced with multiple different prior distributions and thus are required to properly consider them in a sensible manner in our model. While this situation is reasonably well explored for classical Bayesian Statistics, it appears useful to develop a corresponding method for complex Machine Learning problems. Given their underlying Bayesian framework and their widespread popularity, Gaussian Processes are a good candidate to tackle this task. We therefore extend the idea of Mixture models for Gaussian Process regression in order to work with multiple prior beliefs at once - both a analytical regression formula and a Sparse Variational approach are considered. In addition, we consider the usage of our approach to additionally account for the problem of prior misspecification in functional regression problems. △ Less

Submitted 19 April, 2021; originally announced April 2021.

arXiv:2103.16183 [pdf, other]

Repopulating Street Scenes

Authors: Yifan Wang, Andrew Liu, Richard Tucker, Jiajun Wu, Brian L. Curless, Steven M. Seitz, Noah Snavely

Abstract: We present a framework for automatically reconfiguring images of street scenes by populating, depopulating, or repopulating them with objects such as pedestrians or vehicles. Applications of this method include anonymizing images to enhance privacy, generating data augmentations for perception tasks like autonomous driving, and composing scenes to achieve a certain ambiance, such as empty streets… ▽ More We present a framework for automatically reconfiguring images of street scenes by populating, depopulating, or repopulating them with objects such as pedestrians or vehicles. Applications of this method include anonymizing images to enhance privacy, generating data augmentations for perception tasks like autonomous driving, and composing scenes to achieve a certain ambiance, such as empty streets in the early morning. At a technical level, our work has three primary contributions: (1) a method for clearing images of objects, (2) a method for estimating sun direction from a single image, and (3) a way to compose objects in scenes that respects scene geometry and illumination. Each component is learned from data with minimal ground truth annotations, by making creative use of large-numbers of short image bursts of street scenes. We demonstrate convincing results on a range of street scenes and illustrate potential applications. △ Less

Submitted 30 March, 2021; originally announced March 2021.

Comments: CVPR 2021

arXiv:2102.10414 [pdf, other]

doi 10.1093/mnras/stab3269

Synthetic Galaxy Clusters and Observations Based on Dark Energy Survey Year 3 Data

Authors: T. N. Varga, D. Gruen, S. Seitz, N. MacCrann, E. Sheldon, W. G. Hartley, A. Amon, A. Choi, A. Palmese, Y. Zhang, M. R. Becker, J. McCullough, E. Rozo, E. S. Rykoff, C. To, S. Grandis, G. M. Bernstein, S. Dodelson, K. Eckert, S. Everett, R. A. Gruendl, I. Harrison, K. Herner, R. P. Rollins, I. Sevilla-Noarbe , et al. (53 additional authors not shown)

Abstract: We develop a novel data-driven method for generating synthetic optical observations of galaxy clusters. In cluster weak lensing, the interplay between analysis choices and systematic effects related to source galaxy selection, shape measurement and photometric redshift estimation can be best characterized in end-to-end tests going from mock observations to recovered cluster masses. To create such… ▽ More We develop a novel data-driven method for generating synthetic optical observations of galaxy clusters. In cluster weak lensing, the interplay between analysis choices and systematic effects related to source galaxy selection, shape measurement and photometric redshift estimation can be best characterized in end-to-end tests going from mock observations to recovered cluster masses. To create such test scenarios, we measure and model the photometric properties of galaxy clusters and their sky environments from the Dark Energy Survey Year 3 (DES Y3) data in two bins of cluster richness $λ\in[30;\,45)$, $λ\in[45;\,60)$ and three bins in cluster redshift ($z\in[0.3;\,0.35)$, $z\in[0.45;\,0.5)$ and $z\in[0.6;\,0.65)$. Using deep-field imaging data we extrapolate galaxy populations beyond the limiting magnitude of DES Y3 and calculate the properties of cluster member galaxies via statistical background subtraction. We construct mock galaxy clusters as random draws from a distribution function, and render mock clusters and line-of-sight catalogs into synthetic images in the same format as actual survey observations. Synthetic galaxy clusters are generated from real observational data, and thus are independent from the assumptions inherent to cosmological simulations. The recipe can be straightforwardly modified to incorporate extra information, and correct for survey incompleteness. New realizations of synthetic clusters can be created at minimal cost, which will allow future analyses to generate the large number of images needed to characterize systematic uncertainties in cluster mass measurements. △ Less

Submitted 31 January, 2022; v1 submitted 20 February, 2021; originally announced February 2021.

Comments: accepted to MNRAS, 22 pages, 14 figures

arXiv:2102.10177 [pdf, other]

doi 10.1093/mnras/stab1801

The integrated 3-point correlation function of cosmic shear

Authors: Anik Halder, Oliver Friedrich, Stella Seitz, Tamas N. Varga

Abstract: We present the integrated 3-point shear correlation function $iζ_{\pm}$ -- a higher-order statistic of the cosmic shear field -- which can be directly estimated in wide-area weak lensing surveys without measuring the full 3-point shear correlation function, making this a practical and complementary tool to 2-point statistics for weak lensing cosmology. We define it as the 1-point aperture mass sta… ▽ More We present the integrated 3-point shear correlation function $iζ_{\pm}$ -- a higher-order statistic of the cosmic shear field -- which can be directly estimated in wide-area weak lensing surveys without measuring the full 3-point shear correlation function, making this a practical and complementary tool to 2-point statistics for weak lensing cosmology. We define it as the 1-point aperture mass statistic $M_{\mathrm{ap}}$ measured at different locations on the shear field correlated with the corresponding local 2-point shear correlation function $ξ_{\pm}$. Building upon existing work on the integrated bispectrum of the weak lensing convergence field, we present a theoretical framework for computing the integrated 3-point function in real space for any projected field within the flat-sky approximation and apply it to cosmic shear. Using analytical formulae for the non-linear matter power spectrum and bispectrum, we model $iζ_{\pm}$ and validate it on N-body simulations within the uncertainties expected from the sixth year cosmic shear data of the Dark Energy Survey. We also explore the Fisher information content of $iζ_{\pm}$ and perform a joint analysis with $ξ_{\pm}$ for two tomographic source redshift bins with realistic shape-noise to analyse its power in constraining cosmological parameters. We find that the joint analysis of $ξ_{\pm}$ and $iζ_{\pm}$ has the potential to considerably improve parameter constraints from $ξ_{\pm}$ alone, and can be particularly useful in improving the figure of merit of the dynamical dark energy equation of state parameters from cosmic shear data. △ Less

Submitted 2 July, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

Comments: Accepted for publication in MNRAS; v2 matches the accepted manuscript; 18 pages + appendix

arXiv:2012.12261 [pdf, other]

doi 10.1145/3478513.3480485

Time-Travel Rephotography

Authors: Xuan Luo, Xuaner Zhang, Paul Yoo, Ricardo Martin-Brualla, Jason Lawrence, Steven M. Seitz

Abstract: Many historical people were only ever captured by old, faded, black and white photos, that are distorted due to the limitations of early cameras and the passage of time. This paper simulates traveling back in time with a modern camera to rephotograph famous subjects. Unlike conventional image restoration filters which apply independent operations like denoising, colorization, and superresolution,… ▽ More Many historical people were only ever captured by old, faded, black and white photos, that are distorted due to the limitations of early cameras and the passage of time. This paper simulates traveling back in time with a modern camera to rephotograph famous subjects. Unlike conventional image restoration filters which apply independent operations like denoising, colorization, and superresolution, we leverage the StyleGAN2 framework to project old photos into the space of modern high-resolution photos, achieving all of these effects in a unified framework. A unique challenge with this approach is retaining the identity and pose of the subject in the original photo, while discarding the many artifacts frequently seen in low-quality antique photos. Our comparisons to current state-of-the-art restoration filters show significant improvements and compelling results for a variety of important historical people. △ Less

Submitted 13 December, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

Comments: SIGGRAPH Asia 2021. Project Page: https://time-travel-rephotography.github.io Video: https://youtu.be/ceIopN2UZ_s

Journal ref: ACM Transactions on Graphics. 40 (2021) 1-12

arXiv:2012.07810 [pdf, other]

Real-Time High-Resolution Background Matting

Authors: Shanchuan Lin, Andrey Ryabtsev, Soumyadip Sengupta, Brian Curless, Steve Seitz, Ira Kemelmacher-Shlizerman

Abstract: We introduce a real-time, high-resolution background replacement technique which operates at 30fps in 4K resolution, and 60fps for HD on a modern GPU. Our technique is based on background matting, where an additional frame of the background is captured and used in recovering the alpha matte and the foreground layer. The main challenge is to compute a high-quality alpha matte, preserving strand-lev… ▽ More We introduce a real-time, high-resolution background replacement technique which operates at 30fps in 4K resolution, and 60fps for HD on a modern GPU. Our technique is based on background matting, where an additional frame of the background is captured and used in recovering the alpha matte and the foreground layer. The main challenge is to compute a high-quality alpha matte, preserving strand-level hair details, while processing high-resolution images in real-time. To achieve this goal, we employ two neural networks; a base network computes a low-resolution result which is refined by a second network operating at high-resolution on selective patches. We introduce two largescale video and image matting datasets: VideoMatte240K and PhotoMatte13K/85. Our approach yields higher quality results compared to the previous state-of-the-art in background matting, while simultaneously yielding a dramatic boost in both speed and resolution. △ Less

Submitted 14 December, 2020; originally announced December 2020.

arXiv:2011.15128 [pdf, other]

Animating Pictures with Eulerian Motion Fields

Authors: Aleksander Holynski, Brian Curless, Steven M. Seitz, Richard Szeliski

Abstract: In this paper, we demonstrate a fully automatic method for converting a still image into a realistic animated loo** video. We target scenes with continuous fluid motion, such as flowing water and billowing smoke. Our method relies on the observation that this type of natural motion can be convincingly reproduced from a static Eulerian motion description, i.e. a single, temporally constant flow f… ▽ More In this paper, we demonstrate a fully automatic method for converting a still image into a realistic animated loo** video. We target scenes with continuous fluid motion, such as flowing water and billowing smoke. Our method relies on the observation that this type of natural motion can be convincingly reproduced from a static Eulerian motion description, i.e. a single, temporally constant flow field that defines the immediate motion of a particle at a given 2D location. We use an image-to-image translation network to encode motion priors of natural scenes collected from online videos, so that for a new photo, we can synthesize a corresponding motion field. The image is then animated using the generated motion through a deep war** technique: pixels are encoded as deep features, those features are warped via Eulerian motion, and the resulting warped feature maps are decoded as images. In order to produce continuous, seamlessly loo** video textures, we propose a novel video loo** technique that flows features both forward and backward in time and then blends the results. We demonstrate the effectiveness and robustness of our method by applying it to a large collection of examples including beaches, waterfalls, and flowing rivers. △ Less

Submitted 30 November, 2020; originally announced November 2020.

arXiv:2011.12948 [pdf, other]

Nerfies: Deformable Neural Radiance Fields

Authors: Keunhong Park, Utkarsh Sinha, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Steven M. Seitz, Ricardo Martin-Brualla

Abstract: We present the first method capable of photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones. Our approach augments neural radiance fields (NeRF) by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. We observe that these NeRF-like deformation fields are prone to local mini… ▽ More We present the first method capable of photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones. Our approach augments neural radiance fields (NeRF) by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. We observe that these NeRF-like deformation fields are prone to local minima, and propose a coarse-to-fine optimization method for coordinate-based models that allows for more robust optimization. By adapting principles from geometry processing and physical simulation to NeRF-like models, we propose an elastic regularization of the deformation field that further improves robustness. We show that our method can turn casually captured selfie photos/videos into deformable NeRF models that allow for photorealistic renderings of the subject from arbitrary viewpoints, which we dub "nerfies." We evaluate our method by collecting time-synchronized data using a rig with two mobile phones, yielding train/validation images of the same pose at different viewpoints. We show that our method faithfully reconstructs non-rigidly deforming scenes and reproduces unseen views with high fidelity. △ Less

Submitted 9 September, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

Comments: ICCV 2021, Project page with videos: https://nerfies.github.io/

arXiv:2010.06007 [pdf, other]

The Cone of Silence: Speech Separation by Localization

Authors: Teerapat Jenrungrot, Vivek Jayaram, Steve Seitz, Ira Kemelmacher-Shlizerman

Abstract: Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers. At the core of our method is a deep network, in the waveform domain, which isolates sources within an angular region $θ\pm w/2$, given an angle of interest $θ$ and angular window size $w$. By exponentially decreasing $w$, we can perf… ▽ More Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers. At the core of our method is a deep network, in the waveform domain, which isolates sources within an angular region $θ\pm w/2$, given an angle of interest $θ$ and angular window size $w$. By exponentially decreasing $w$, we can perform a binary search to localize and separate all sources in logarithmic time. Our algorithm allows for an arbitrary number of potentially moving speakers at test time, including more speakers than seen during training. Experiments demonstrate state-of-the-art performance for both source separation and source localization, particularly in high levels of background noise. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: 9 pages + references + supplementary. Oral presentation at NeurIPS 2020

arXiv:2007.13303 [pdf, other]

Reconstructing NBA Players

Authors: Luyang Zhu, Konstantinos Rematas, Brian Curless, Steve Seitz, Ira Kemelmacher-Shlizerman

Abstract: Great progress has been made in 3D body pose and shape estimation from a single photo. Yet, state-of-the-art results still suffer from errors due to challenging body poses, modeling clothing, and self occlusions. The domain of basketball games is particularly challenging, as it exhibits all of these challenges. In this paper, we introduce a new approach for reconstruction of basketball players tha… ▽ More Great progress has been made in 3D body pose and shape estimation from a single photo. Yet, state-of-the-art results still suffer from errors due to challenging body poses, modeling clothing, and self occlusions. The domain of basketball games is particularly challenging, as it exhibits all of these challenges. In this paper, we introduce a new approach for reconstruction of basketball players that outperforms the state-of-the-art. Key to our approach is a new method for creating poseable, skinned models of NBA players, and a large database of meshes (derived from the NBA2K19 video game), that we are releasing to the research community. Based on these models, we introduce a new method that takes as input a single photo of a clothed player in any basketball pose and outputs a high resolution mesh and 3D pose for that player. We demonstrate substantial improvement over state-of-the-art, single-image methods for body shape reconstruction. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Comments: ECCV 2020

arXiv:2007.09209 [pdf, other]

People as Scene Probes

Authors: Yifan Wang, Brian Curless, Steve Seitz

Abstract: By analyzing the motion of people and other objects in a scene, we demonstrate how to infer depth, occlusion, lighting, and shadow information from video taken from a single camera viewpoint. This information is then used to composite new objects into the same scene with a high degree of automation and realism. In particular, when a user places a new object (2D cut-out) in the image, it is automat… ▽ More By analyzing the motion of people and other objects in a scene, we demonstrate how to infer depth, occlusion, lighting, and shadow information from video taken from a single camera viewpoint. This information is then used to composite new objects into the same scene with a high degree of automation and realism. In particular, when a user places a new object (2D cut-out) in the image, it is automatically rescaled, relit, occluded properly, and casts realistic shadows in the correct direction relative to the sun, and which conform properly to scene geometry. We demonstrate results (best viewed in supplementary video) on a range of scenes and compare to alternative methods for depth estimation and shadow compositing. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: ECCV 2020

arXiv:2004.00626 [pdf, other]

Background Matting: The World is Your Green Screen

Authors: Soumyadip Sengupta, Vivek Jayaram, Brian Curless, Steve Seitz, Ira Kemelmacher-Shlizerman

Abstract: We propose a method for creating a matte -- the per-pixel foreground color and alpha -- of a person by taking photos or videos in an everyday setting with a handheld camera. Most existing matting methods require a green screen background or a manually created trimap to produce a good matte. Automatic, trimap-free methods are appearing, but are not of comparable quality. In our trimap free approach… ▽ More We propose a method for creating a matte -- the per-pixel foreground color and alpha -- of a person by taking photos or videos in an everyday setting with a handheld camera. Most existing matting methods require a green screen background or a manually created trimap to produce a good matte. Automatic, trimap-free methods are appearing, but are not of comparable quality. In our trimap free approach, we ask the user to take an additional photo of the background without the subject at the time of capture. This step requires a small amount of foresight but is far less time-consuming than creating a trimap. We train a deep network with an adversarial loss to predict the matte. We first train a matting network with supervised loss on ground truth data with synthetic composites. To bridge the domain gap to real imagery with no labeling, we train another matting network guided by the first network and by a discriminator that judges the quality of composites. We demonstrate results on a wide variety of photos and videos and show significant improvement over the state of the art. △ Less

Submitted 9 April, 2020; v1 submitted 1 April, 2020; originally announced April 2020.

Comments: Accepted to CVPR 2020

arXiv:2001.04642 [pdf, other]

Seeing the World in a Bag of Chips

Authors: Jeong Joon Park, Aleksander Holynski, Steve Seitz

Abstract: We address the dual problems of novel view synthesis and environment reconstruction from hand-held RGBD sensors. Our contributions include 1) modeling highly specular objects, 2) modeling inter-reflections and Fresnel effects, and 3) enabling surface light field reconstruction with the same input needed to reconstruct shape alone. In cases where scene surface has a strong mirror-like material comp… ▽ More We address the dual problems of novel view synthesis and environment reconstruction from hand-held RGBD sensors. Our contributions include 1) modeling highly specular objects, 2) modeling inter-reflections and Fresnel effects, and 3) enabling surface light field reconstruction with the same input needed to reconstruct shape alone. In cases where scene surface has a strong mirror-like material component, we generate highly detailed environment images, revealing room composition, objects, people, buildings, and trees visible through windows. Our approach yields state of the art view synthesis techniques, operates on low dynamic range imagery, and is robust to geometric and calibration errors. △ Less

Submitted 15 June, 2020; v1 submitted 14 January, 2020; originally announced January 2020.

Comments: CVPR 2020

arXiv:1909.11096 [pdf, other]

doi 10.3847/1538-4357/ab4db8

The KMOS^3D Survey: data release and final survey paper

Authors: E. Wisnioski, N. M. Förster Schreiber, M. Fossati, J. T. Mendel, D. Wilman, R. Genzel, R. Bender, S. Wuyts, R. L. Davies, H. Übler, K. Bandara, A. Beifiori, S. Belli, G. Brammer, J. Chan, R. I. Davies, M. Fabricius, A. Galametz, P. Lang, D. Lutz, E. J. Nelson, I. Momcheva, S. Price, D. Rosario, R. Saglia , et al. (6 additional authors not shown)

Abstract: We present the completed KMOS$^\mathrm{3D}$ survey $-$ an integral field spectroscopic survey of 739, $\log(M_{\star}/M_{\odot})>9$, galaxies at $0.6<z<2.7$ using the K-band Multi Object Spectrograph (KMOS) at the Very Large Telescope (VLT). KMOS$^\mathrm{3D}$ provides a population-wide census of kinematics, star formation, outflows, and nebular gas conditions both on and off the star-forming gala… ▽ More We present the completed KMOS$^\mathrm{3D}$ survey $-$ an integral field spectroscopic survey of 739, $\log(M_{\star}/M_{\odot})>9$, galaxies at $0.6<z<2.7$ using the K-band Multi Object Spectrograph (KMOS) at the Very Large Telescope (VLT). KMOS$^\mathrm{3D}$ provides a population-wide census of kinematics, star formation, outflows, and nebular gas conditions both on and off the star-forming galaxy main sequence through the spatially resolved and integrated properties of H$α$, [N II], and [S II] emission lines. We detect H$α$ emission for 91% of galaxies on the main sequence of star-formation and 79% overall. The depth of the survey has allowed us to detect galaxies with star-formation rates below 1 M$_{\odot}$/ yr$^{-1}$, as well as to resolve 81% of detected galaxies with $\geq3$ resolution elements along the kinematic major axis. The detection fraction of H$α$ is a strong function of both color and offset from the main sequence, with the detected and non-detected samples exhibiting different SED shapes. Comparison of H$α$ and UV+IR star formation rates (SFRs) reveal that dust attenuation corrections may be underestimated by 0.5 dex at the highest masses ($\log(M_{\star}/M_{\odot})>10.5$). We confirm our first year results of a high rotation dominated fraction (monotonic velocity gradient and $v_\mathrm{rot}$/$σ_0 > \sqrt{3.36}$) of 77% for the full KMOS$^\mathrm{3D}$ H$α$sample. The rotation-dominated fraction is a function of both stellar mass and redshift with the strongest evolution measured over the redshift range of the survey for galaxies with $\log(M_{\star}/M_{\odot})<10.5$. With this paper we include a final data release of all 739 observed objects. △ Less

Submitted 24 September, 2019; originally announced September 2019.

Comments: 26 pages, 18 figures, 8 tables; re-submitted after minor revisions to ApJ; associated data release at: http://www.mpe.mpg.de/ir/KMOS3D

arXiv:1908.07732 [pdf, other]

KeystoneDepth: Visualizing History in 3D

Authors: Xuan Luo, Yanmeng Kong, Jason Lawrence, Ricardo Martin-Brualla, Steve Seitz

Abstract: This paper introduces the largest and most diverse collection of rectified stereo image pairs to the research community, KeystoneDepth, consisting of tens of thousands of stereographs of historical people, events, objects, and scenes between 1860 and 1963. Leveraging the Keystone-Mast raw scans from the California Museum of Photography, we apply multiple processing steps to produce clean stereo im… ▽ More This paper introduces the largest and most diverse collection of rectified stereo image pairs to the research community, KeystoneDepth, consisting of tens of thousands of stereographs of historical people, events, objects, and scenes between 1860 and 1963. Leveraging the Keystone-Mast raw scans from the California Museum of Photography, we apply multiple processing steps to produce clean stereo image pairs, complete with calibration data, rectification transforms, and depthmaps. A second contribution is a novel approach for view synthesis that runs at real-time rates on a mobile device, simulating the experience of looking through an open window into these historical scenes. We produce results for thousands of antique stereographs, capturing many important historical moments. △ Less

Submitted 19 September, 2019; v1 submitted 21 August, 2019; originally announced August 2019.

Comments: Project website: http://roxanneluo.github.io/KeystoneDepth.html , Video: https://youtu.be/5JrX-KKisC8 , More results: http://roxanneluo.github.io/keystonedepth_supplementary/index.html

arXiv:1906.03539 [pdf, other]

Structure from Motion for Panorama-Style Videos

Authors: Chris Sweeney, Aleksander Holynski, Brian Curless, Steve M Seitz

Abstract: We present a novel Structure from Motion pipeline that is capable of reconstructing accurate camera poses for panorama-style video capture without prior camera intrinsic calibration. While panorama-style capture is common and convenient, previous reconstruction methods fail to obtain accurate reconstructions due to the rotation-dominant motion and small baseline between views. Our method is built… ▽ More We present a novel Structure from Motion pipeline that is capable of reconstructing accurate camera poses for panorama-style video capture without prior camera intrinsic calibration. While panorama-style capture is common and convenient, previous reconstruction methods fail to obtain accurate reconstructions due to the rotation-dominant motion and small baseline between views. Our method is built on the assumption that the camera motion approximately corresponds to motion on a sphere, and we introduce three novel relative pose methods to estimate the fundamental matrix and camera distortion for spherical motion. These solvers are efficient and robust, and provide an excellent initialization for bundle adjustment. A soft prior on the camera poses is used to discourage large deviations from the spherical motion assumption when performing bundle adjustment, which allows cameras to remain properly constrained for optimization in the absence of well-triangulated 3D points. To validate the effectiveness of the proposed method we evaluate our approach on both synthetic and real-world data, and demonstrate that camera poses are accurate enough for multiview stereo. △ Less

Submitted 8 June, 2019; originally announced June 2019.

arXiv:1905.12682 [pdf, other]

doi 10.1093/mnras/staa565

Dark Energy Survey Year 1 Results: Wide field mass maps via forward fitting in harmonic space

Authors: B. Mawdsley, D. Bacon, C. Chang, P. Melchior, E. Rozo, S. Seitz, N. Jeffrey, M. Gatti, E. Gaztanaga, D. Gruen, W. G. Hartley, B. Hoyle, S. Samuroff, E. Sheldon, M. A. Troxel, J. Zuntz, T. M. C. Abbott, J. Annis, E. Bertin, S. L. Bridle, D. Brooks, E. Buckley-Geer, D. L. Burke, A. Carnero Rosell, M. Car- rasco Kind , et al. (41 additional authors not shown)

Abstract: We present new wide-field weak lensing mass maps for the Year 1 Dark Energy Survey data, generated via a forward fitting approach. This method of producing maps does not impose any prior constraints on the mass distribution to be reconstructed. The technique is found to improve the map reconstruction on the edges of the field compared to the conventional Kaiser-Squires method, which applies a dire… ▽ More We present new wide-field weak lensing mass maps for the Year 1 Dark Energy Survey data, generated via a forward fitting approach. This method of producing maps does not impose any prior constraints on the mass distribution to be reconstructed. The technique is found to improve the map reconstruction on the edges of the field compared to the conventional Kaiser-Squires method, which applies a direct inversion on the data; our approach is in good agreement with the previous direct approach in the central regions of the footprint. The map** technique is assessed and verified with tests on simulations; together with the Kaiser-Squires method, the technique is then applied to data from the Dark Energy Survey Year 1 data and the differences between the two methods are compared. We also produce the first DES measurements of the convergence Minkowski functionals and compare them to those measured in simulations. △ Less

Submitted 29 May, 2019; originally announced May 2019.

Comments: 19 pages, 16 figures. Submitted to MNRAS

arXiv:1904.13342 [pdf, other]

doi 10.1002/mp.13753

PYRO-NN: Python Reconstruction Operators in Neural Networks

Authors: Christopher Syben, Markus Michen, Bernhard Stimpel, Stephan Seitz, Stefan Ploner, Andreas K. Maier

Abstract: Purpose: Recently, several attempts were conducted to transfer deep learning to medical image reconstruction. An increasingly number of publications follow the concept of embedding the CT reconstruction as a known operator into a neural network. However, most of the approaches presented lack an efficient CT reconstruction framework fully integrated into deep learning environments. As a result, man… ▽ More Purpose: Recently, several attempts were conducted to transfer deep learning to medical image reconstruction. An increasingly number of publications follow the concept of embedding the CT reconstruction as a known operator into a neural network. However, most of the approaches presented lack an efficient CT reconstruction framework fully integrated into deep learning environments. As a result, many approaches are forced to use workarounds for mathematically unambiguously solvable problems. Methods: PYRO-NN is a generalized framework to embed known operators into the prevalent deep learning framework Tensorflow. The current status includes state-of-the-art parallel-, fan- and cone-beam projectors and back-projectors accelerated with CUDA provided as Tensorflow layers. On top, the framework provides a high level Python API to conduct FBP and iterative reconstruction experiments with data from real CT systems. Results: The framework provides all necessary algorithms and tools to design end-to-end neural network pipelines with integrated CT reconstruction algorithms. The high level Python API allows a simple use of the layers as known from Tensorflow. To demonstrate the capabilities of the layers, the framework comes with three baseline experiments showing a cone-beam short scan FDK reconstruction, a CT reconstruction filter learning setup, and a TV regularized iterative reconstruction. All algorithms and tools are referenced to a scientific publication and are compared to existing non deep learning reconstruction frameworks. The framework is available as open-source software at \url{https://github.com/csyben/PYRO-NN}. Conclusions: PYRO-NN comes with the prevalent deep learning framework Tensorflow and allows to setup end-to-end trainable neural networks in the medical image reconstruction context. We believe that the framework will be a step towards reproducible research △ Less

Submitted 30 April, 2019; originally announced April 2019.

Comments: V1: Submitted to Medical Physics, 11 pages, 7 figures

arXiv:1812.05583 [pdf, other]

Scene Recomposition by Learning-based ICP

Authors: Hamid Izadinia, Steven M. Seitz

Abstract: By moving a depth sensor around a room, we compute a 3D CAD model of the environment, capturing the room shape and contents such as chairs, desks, sofas, and tables. Rather than reconstructing geometry, we match, place, and align each object in the scene to thousands of CAD models of objects. In addition to the fully automatic system, the key technical contribution is a novel approach for aligning… ▽ More By moving a depth sensor around a room, we compute a 3D CAD model of the environment, capturing the room shape and contents such as chairs, desks, sofas, and tables. Rather than reconstructing geometry, we match, place, and align each object in the scene to thousands of CAD models of objects. In addition to the fully automatic system, the key technical contribution is a novel approach for aligning CAD models to 3D scans, based on deep reinforcement learning. This approach, which we call Learning-based ICP, outperforms prior ICP methods in the literature, by learning the best points to match and conditioning on object viewpoint. LICP learns to align using only synthetic data and does not require ground truth annotation of object pose or keypoint pair matching in real scene scans. While LICP is trained on synthetic data and without 3D real scene annotations, it outperforms both learned local deep feature matching and geometric based alignment methods in real scenes. The proposed method is evaluated on real scenes datasets of SceneNN and ScanNet as well as synthetic scenes of SUNCG. High quality results are demonstrated on a range of real world scenes, with robustness to clutter, viewpoint, and occlusion. △ Less

Submitted 7 April, 2020; v1 submitted 13 December, 2018; originally announced December 2018.

Comments: To appear at CVPR 2020

arXiv:1812.05116 [pdf, other]

doi 10.1093/mnras/stz2185

Dark Energy Survey Year 1 results: Validation of weak lensing cluster member contamination estimates from P(z) decomposition

Authors: T. N. Varga, J. DeRose, D. Gruen, T. McClintock, S. Seitz, E. Rozo, M. Costanzi, B. Hoyle, N. MacCrann, A. A. Plazas, E. S. Rykoff, M. Simet, A. von der Linden, R. H. Wechsler, J. Annis, S. Avila, E. Bertin, D. Brooks, E. Buckley-Geer, D. L. Burke, A. Carnero Rosell, M. Carrasco Kind, J. Carretero, C. E. Cunha, C. B. D'Andrea , et al. (46 additional authors not shown)

Abstract: Weak lensing source galaxy catalogs used in estimating the masses of galaxy clusters can be heavily contaminated by cluster members, prohibiting accurate mass calibration. In this study we test the performance of an estimator for the extent of cluster member contamination based on decomposing the photometric redshift $P(z)$ of source galaxies into contaminating and background components. We perfor… ▽ More Weak lensing source galaxy catalogs used in estimating the masses of galaxy clusters can be heavily contaminated by cluster members, prohibiting accurate mass calibration. In this study we test the performance of an estimator for the extent of cluster member contamination based on decomposing the photometric redshift $P(z)$ of source galaxies into contaminating and background components. We perform a full scale mock analysis on a simulated sky survey approximately mirroring the observational properties of the Dark Energy Survey Year One observations (DES Y1), and find excellent agreement between the true number profile of contaminating cluster member galaxies in the simulation and the estimated one. We further apply the method to estimate the cluster member contamination for the DES Y1 redMaPPer cluster mass calibration analysis, and compare the results to an alternative approach based on the angular correlation of weak lensing source galaxies. We find indications that the correlation based estimates are biased by the selection of the weak lensing sources in the cluster vicinity, which does not strongly impact the $P(z)$ decomposition method. Collectively, these benchmarks demonstrate the strength of the $P(z)$ decomposition method in alleviating membership contamination and enabling highly accurate cluster weak lensing studies without broad exclusion of source galaxies, thereby improving the total constraining power of cluster mass calibration via weak lensing. △ Less

Submitted 12 December, 2018; originally announced December 2018.

Comments: 14 pages, 8 figures; submitted to MNNRAS

arXiv:1811.05029 [pdf, other]

LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering

Authors: Ricardo Martin-Brualla, Rohit Pandey, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Julien Valentin, Sameh Khamis, Philip Davidson, Anastasia Tkach, Peter Lincoln, Adarsh Kowdle, Christoph Rhemann, Dan B Goldman, Cem Keskin, Steve Seitz, Shahram Izadi, Sean Fanello

Abstract: Motivated by augmented and virtual reality applications such as telepresence, there has been a recent focus in real-time performance capture of humans under motion. However, given the real-time constraint, these systems often suffer from artifacts in geometry and texture such as holes and noise in the final rendering, poor lighting, and low-resolution textures. We take the novel approach to augmen… ▽ More Motivated by augmented and virtual reality applications such as telepresence, there has been a recent focus in real-time performance capture of humans under motion. However, given the real-time constraint, these systems often suffer from artifacts in geometry and texture such as holes and noise in the final rendering, poor lighting, and low-resolution textures. We take the novel approach to augment such real-time performance capture systems with a deep architecture that takes a rendering from an arbitrary viewpoint, and jointly performs completion, super resolution, and denoising of the imagery in real-time. We call this approach neural (re-)rendering, and our live system "LookinGood". Our deep architecture is trained to produce high resolution and high quality images from a coarse rendering in real-time. First, we propose a self-supervised training method that does not require manual ground-truth annotation. We contribute a specialized reconstruction error that uses semantic information to focus on relevant parts of the subject, e.g. the face. We also introduce a salient reweighing scheme of the loss function that is able to discard outliers. We specifically design the system for virtual and augmented reality headsets where the consistency between the left and right eye plays a crucial role in the final user experience. Finally, we generate temporally stable results by explicitly minimizing the difference between two consecutive frames. We tested the proposed system in two different scenarios: one involving a single RGB-D sensor, and upper body reconstruction of an actor, the second consisting of full body 360 degree capture. Through extensive experimentation, we demonstrate how our system generalizes across unseen sequences and subjects. The supplementary video is available at http://youtu.be/Md3tdAKoLGU. △ Less

Submitted 12 November, 2018; originally announced November 2018.

Comments: The supplementary video is available at: http://youtu.be/Md3tdAKoLGU To be presented at SIGGRAPH Asia 2018

arXiv:1809.09761 [pdf, other]

doi 10.1145/3272127.3275066

PhotoShape: Photorealistic Materials for Large-Scale Shape Collections

Authors: Keunhong Park, Konstantinos Rematas, Ali Farhadi, Steven M. Seitz

Abstract: Existing online 3D shape repositories contain thousands of 3D models but lack photorealistic appearance. We present an approach to automatically assign high-quality, realistic appearance models to large scale 3D shape collections. The key idea is to jointly leverage three types of online data -- shape collections, material collections, and photo collections, using the photos as reference to guide… ▽ More Existing online 3D shape repositories contain thousands of 3D models but lack photorealistic appearance. We present an approach to automatically assign high-quality, realistic appearance models to large scale 3D shape collections. The key idea is to jointly leverage three types of online data -- shape collections, material collections, and photo collections, using the photos as reference to guide assignment of materials to shapes. By generating a large number of synthetic renderings, we train a convolutional neural network to classify materials in real photos, and employ 3D-2D alignment techniques to transfer materials to different parts of each shape model. Our system produces photorealistic, relightable, 3D shapes (PhotoShapes). △ Less

Submitted 25 September, 2018; originally announced September 2018.

Comments: To be presented at SIGGRAPH Asia 2018. Project page: https://keunhong.com/publications/photoshape/

arXiv:1809.02057 [pdf]

Surface Light Field Fusion

Authors: Jeong Joon Park, Richard Newcombe, Steve Seitz

Abstract: We present an approach for interactively scanning highly reflective objects with a commodity RGBD sensor. In addition to shape, our approach models the surface light field, encoding scene appearance from all directions. By factoring the surface light field into view-independent and wavelength-independent components, we arrive at a representation that can be robustly estimated with IR-equipped comm… ▽ More We present an approach for interactively scanning highly reflective objects with a commodity RGBD sensor. In addition to shape, our approach models the surface light field, encoding scene appearance from all directions. By factoring the surface light field into view-independent and wavelength-independent components, we arrive at a representation that can be robustly estimated with IR-equipped commodity depth sensors, and achieves high quality results. △ Less

Submitted 6 September, 2018; originally announced September 2018.

Comments: Project Website: http://grail.cs.washington.edu/projects/slfusion/

Journal ref: 3DV 2018

arXiv:1807.08753 [pdf, other]

doi 10.3847/1538-4357/aad4a1

M31 PAndromeda Cepheid sample observed in four HST bands

Authors: Mihael Kodric, Arno Riffeser, Stella Seitz, Ulrich Hopp, Jan Snigula, Claus Goessl, Johannes Koppenhoefer, Ralf Bender

Abstract: Using the M31 PAndromeda Cepheid sample and the HST PHAT data we obtain the largest Cepheid sample in M31 with HST data in four bands. For our analysis we consider three samples: A very homogeneous sample of Cepheids based on the PAndromeda data, the mean magnitude corrected PAndromeda sample and a sample complementing the PAndromeda sample with Cepheids from literature. The latter results in the… ▽ More Using the M31 PAndromeda Cepheid sample and the HST PHAT data we obtain the largest Cepheid sample in M31 with HST data in four bands. For our analysis we consider three samples: A very homogeneous sample of Cepheids based on the PAndromeda data, the mean magnitude corrected PAndromeda sample and a sample complementing the PAndromeda sample with Cepheids from literature. The latter results in the largest catalog with 522 fundamental mode (FM) Cepheids and 102 first overtone (FO) Cepheids with F160W and F110W data and 559 FM Cepheids and 111 FO Cepheids with F814W and F475W data. The obtained dispersion of the Period-Luminosity relations (PLRs) is very small (e.g. 0.138 mag in the F160W sample I PLR). We find no broken slope in the PLRs when analyzing our entire sample, but we do identify a subsample of Cepheids that causes the broken slope. However, this effect only shows when the number of this Cepheid type makes up a significant fraction of the total sample. We also analyze the sample selection effect on the Hubble constant. △ Less

Submitted 23 July, 2018; originally announced July 2018.

Comments: 32 pages, 19 figures, 9 tables, accepted for publication in ApJ, electronic data will be available on CDS

arXiv:1806.10614 [pdf, other]

doi 10.1093/mnras/stz817

The Wendelstein Weak Lensing (WWL) pathfinder: Accurate weak lensing masses for Planck clusters

Authors: Romy Louise Rehmann, Daniel Gruen, Stella Seitz, Ralf Bender, Arno Riffeser, Matthias Kluge, Claus Goessl, Ulrich Hopp, Annalisa Mana, Christoph Ries, Michael Schmidt

Abstract: We present results from the Wendelstein Weak Lensing (WWL) pathfinder project, in which we have observed three intermediate redshift Planck clusters of galaxies with the new 30'$\times 30$' wide field imager at the 2m Fraunhofer Telescope at Wendelstein Observatory. We investigate the presence of biases in our shear catalogues and estimate their impact on our weak lensing mass estimates. The overa… ▽ More We present results from the Wendelstein Weak Lensing (WWL) pathfinder project, in which we have observed three intermediate redshift Planck clusters of galaxies with the new 30'$\times 30$' wide field imager at the 2m Fraunhofer Telescope at Wendelstein Observatory. We investigate the presence of biases in our shear catalogues and estimate their impact on our weak lensing mass estimates. The overall calibration uncertainty depends on the cluster redshift and is below 8.1-15 per cent for $z \approx 0.27-0.77$. It will decrease with improvements on the background sample selection and the multiplicative shear bias calibration. We present the first weak lensing mass estimates for PSZ1 G109.88+27.94 and PSZ1 G139.61+24.20, two SZ-selected cluster candidates. Based on Wendelstein colors and SDSS photometry, we find that the redshift of PSZ1 G109.88+27.94 has to be corrected to $z \approx 0.77$. We investigate the influence of line-of-sight structures on the weak lensing mass estimates and find upper limits for two groups in each of the fields of PSZ1 G109.88+27.94 and PSZ1 G186.98+38.66. We compare our results to SZ and dynamical mass estimates from the literature, and in the case of PSZ1 G186.98+38.66 to previous weak lensing mass estimates. We conclude that our pathfinder project demonstrates that weak lensing cluster masses can be accurately measured with the 2m Fraunhofer Telescope. △ Less

Submitted 27 June, 2018; originally announced June 2018.

Comments: 24 pages, 26 figures. Submitted to MNRAS

arXiv:1806.07895 [pdf, other]

doi 10.3847/1538-3881/aad40f

Cepheids in M31 - The PAndromeda Cepheid sample

Authors: Mihael Kodric, Arno Riffeser, Ulrich Hopp, Claus Goessl, Stella Seitz, Ralf Bender, Johannes Koppenhoefer, Christian Obermeier, Jan Snigula, Chien-Hsiu Lee, W. S. Burgett, P. W. Draper, K. W. Hodapp, N. Kaiser, R. -P. Kudritzki, N. Metcalfe, J. L. Tonry, R. J. Wainscoat

Abstract: We present the largest Cepheid sample in M31 based on the complete Pan-STARRS1 survey of Andromeda (PAndromeda) in the $r_{\mathrm{P1}}$ , $i_{\mathrm{P1}}$ and $g_{\mathrm{P1}}$ bands. We find 2686 Cepheids with 1662 fundamental mode Cepheids, 307 first-overtone Cepheids, 278 type II Cepheids and 439 Cepheids with undetermined Cepheid type. Using the method developed by Kodric et al. (2013) we id… ▽ More We present the largest Cepheid sample in M31 based on the complete Pan-STARRS1 survey of Andromeda (PAndromeda) in the $r_{\mathrm{P1}}$ , $i_{\mathrm{P1}}$ and $g_{\mathrm{P1}}$ bands. We find 2686 Cepheids with 1662 fundamental mode Cepheids, 307 first-overtone Cepheids, 278 type II Cepheids and 439 Cepheids with undetermined Cepheid type. Using the method developed by Kodric et al. (2013) we identify Cepheids by using a three dimensional parameter space of Fourier parameters of the Cepheid light curves combined with a color cut and other selection criteria. This is an unbiased approach to identify Cepheids and results in a homogeneous Cepheid sample. The Period-Luminosity relations obtained for our sample have smaller dispersions than in our previous work. We find a broken slope that we previously observed with HST data in Kodric et al. (2015), albeit with a lower significance. △ Less

Submitted 20 June, 2018; originally announced June 2018.

Comments: 79 pages, 39 figures, 8 tables, accepted for publication in AJ, K18b is submittted to ApJ, electronic data will be available on CDS

arXiv:1806.00890 [pdf, other]

Soccer on Your Tabletop

Authors: Konstantinos Rematas, Ira Kemelmacher-Shlizerman, Brian Curless, Steve Seitz

Abstract: We present a system that transforms a monocular video of a soccer game into a moving 3D reconstruction, in which the players and field can be rendered interactively with a 3D viewer or through an Augmented Reality device. At the heart of our paper is an approach to estimate the depth map of each player, using a CNN that is trained on 3D player data extracted from soccer video games. We compare wit… ▽ More We present a system that transforms a monocular video of a soccer game into a moving 3D reconstruction, in which the players and field can be rendered interactively with a 3D viewer or through an Augmented Reality device. At the heart of our paper is an approach to estimate the depth map of each player, using a CNN that is trained on 3D player data extracted from soccer video games. We compare with state of the art body pose and depth estimation techniques, and show results on both synthetic ground truth benchmarks, and real YouTube soccer footage. △ Less

Submitted 3 June, 2018; originally announced June 2018.

Comments: CVPR'18. Project: http://grail.cs.washington.edu/projects/soccer/

Showing 1–50 of 194 results for author: Seitz, S