-
Multimodal Conditional 3D Face Geometry Generation
Authors:
Christopher Otto,
Prashanth Chandran,
Sebastian Weiss,
Markus Gross,
Gaspard Zoss,
Derek Bradley
Abstract:
We present a new method for multimodal conditional 3D face geometry generation that allows user-friendly control over the output identity and expression via a number of different conditioning signals. Within a single model, we demonstrate 3D faces generated from artistic sketches, 2D face landmarks, Canny edges, FLAME face model parameters, portrait photos, or text prompts. Our approach is based o…
▽ More
We present a new method for multimodal conditional 3D face geometry generation that allows user-friendly control over the output identity and expression via a number of different conditioning signals. Within a single model, we demonstrate 3D faces generated from artistic sketches, 2D face landmarks, Canny edges, FLAME face model parameters, portrait photos, or text prompts. Our approach is based on a diffusion process that generates 3D geometry in a 2D parameterized UV domain. Geometry generation passes each conditioning signal through a set of cross-attention layers (IP-Adapter), one set for each user-defined conditioning signal. The result is an easy-to-use 3D face generation tool that produces high resolution geometry with fine-grain user control.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Lossy Image Compression with Foundation Diffusion Models
Authors:
Lucas Relic,
Roberto Azevedo,
Markus Gross,
Christopher Schroers
Abstract:
Incorporating diffusion models in the image compression domain has the potential to produce realistic and detailed reconstructions, especially at extremely low bitrates. Previous methods focus on using diffusion models as expressive decoders robust to quantization errors in the conditioning signals, yet achieving competitive results in this manner requires costly training of the diffusion model an…
▽ More
Incorporating diffusion models in the image compression domain has the potential to produce realistic and detailed reconstructions, especially at extremely low bitrates. Previous methods focus on using diffusion models as expressive decoders robust to quantization errors in the conditioning signals, yet achieving competitive results in this manner requires costly training of the diffusion model and long inference times due to the iterative generative process. In this work we formulate the removal of quantization error as a denoising task, using diffusion to recover lost information in the transmitted image latent. Our approach allows us to perform less than 10\% of the full diffusion generative process and requires no architectural changes to the diffusion model, enabling the use of foundation models as a strong prior without additional fine tuning of the backbone. Our proposed codec outperforms previous methods in quantitative realism metrics, and we verify that our reconstructions are qualitatively preferred by end users, even when other methods use twice the bitrate.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Learning a Generalized Physical Face Model From Data
Authors:
Lingchen Yang,
Gaspard Zoss,
Prashanth Chandran,
Markus Gross,
Barbara Solenthaler,
Eftychios Sifakis,
Derek Bradley
Abstract:
Physically-based simulation is a powerful approach for 3D facial animation as the resulting deformations are governed by physical constraints, allowing to easily resolve self-collisions, respond to external forces and perform realistic anatomy edits. Today's methods are data-driven, where the actuations for finite elements are inferred from captured skin geometry. Unfortunately, these approaches h…
▽ More
Physically-based simulation is a powerful approach for 3D facial animation as the resulting deformations are governed by physical constraints, allowing to easily resolve self-collisions, respond to external forces and perform realistic anatomy edits. Today's methods are data-driven, where the actuations for finite elements are inferred from captured skin geometry. Unfortunately, these approaches have not been widely adopted due to the complexity of initializing the material space and learning the deformation model for each character separately, which often requires a skilled artist followed by lengthy network training. In this work, we aim to make physics-based facial animation more accessible by proposing a generalized physical face model that we learn from a large 3D face dataset in a simulation-free manner. Once trained, our model can be quickly fit to any unseen identity and produce a ready-to-animate physical face model automatically. Fitting is as easy as providing a single 3D face scan, or even a single face image. After fitting, we offer intuitive animation controls, as well as the ability to retarget animations across characters. All the while, the resulting animations allow for physical effects like collision avoidance, gravity, paralysis, bone resha** and more.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
Authors:
Rui Song,
Chenwei Liang,
Hu Cao,
Zhiran Yan,
Walter Zimmer,
Markus Gross,
Andreas Festag,
Alois Knoll
Abstract:
Collaborative perception in automated vehicles leverages the exchange of information between agents, aiming to elevate perception results. Previous camera-based collaborative 3D perception methods typically employ 3D bounding boxes or bird's eye views as representations of the environment. However, these approaches fall short in offering a comprehensive 3D environmental prediction. To bridge this…
▽ More
Collaborative perception in automated vehicles leverages the exchange of information between agents, aiming to elevate perception results. Previous camera-based collaborative 3D perception methods typically employ 3D bounding boxes or bird's eye views as representations of the environment. However, these approaches fall short in offering a comprehensive 3D environmental prediction. To bridge this gap, we introduce the first method for collaborative 3D semantic occupancy prediction. Particularly, it improves local 3D semantic occupancy predictions by hybrid fusion of (i) semantic and occupancy task features, and (ii) compressed orthogonal attention features shared between vehicles. Additionally, due to the lack of a collaborative perception dataset designed for semantic occupancy prediction, we augment a current collaborative perception dataset to include 3D collaborative semantic occupancy labels for a more robust evaluation. The experimental findings highlight that: (i) our collaborative semantic occupancy predictions excel above the results from single vehicles by over 30%, and (ii) models anchored on semantic occupancy outpace state-of-the-art collaborative 3D detection techniques in subsequent perception applications, showcasing enhanced accuracy and enriched semantic-awareness in road environments.
△ Less
Submitted 25 April, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
An Implicit Physical Face Model Driven by Expression and Style
Authors:
Lingchen Yang,
Gaspard Zoss,
Prashanth Chandran,
Paulo Gotardo,
Markus Gross,
Barbara Solenthaler,
Eftychios Sifakis,
Derek Bradley
Abstract:
3D facial animation is often produced by manipulating facial deformation models (or rigs), that are traditionally parameterized by expression controls. A key component that is usually overlooked is expression 'style', as in, how a particular expression is performed. Although it is common to define a semantic basis of expressions that characters can perform, most characters perform each expression…
▽ More
3D facial animation is often produced by manipulating facial deformation models (or rigs), that are traditionally parameterized by expression controls. A key component that is usually overlooked is expression 'style', as in, how a particular expression is performed. Although it is common to define a semantic basis of expressions that characters can perform, most characters perform each expression in their own style. To date, style is usually entangled with the expression, and it is not possible to transfer the style of one character to another when considering facial animation. We present a new face model, based on a data-driven implicit neural physics model, that can be driven by both expression and style separately. At the core, we present a framework for learning implicit physics-based actuations for multiple subjects simultaneously, trained on a few arbitrary performance capture sequences from a small set of identities. Once trained, our method allows generalized physics-based facial animation for any of the trained identities, extending to unseen performances. Furthermore, it grants control over the animation style, enabling style transfer from one character to another or blending styles of different characters. Lastly, as a physics-based model, it is capable of synthesizing physical effects, such as collision handling, setting our method apart from conventional approaches.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Implicit Neural Representation for Physics-driven Actuated Soft Bodies
Authors:
Lingchen Yang,
Byungsoo Kim,
Gaspard Zoss,
Baran Gözcü,
Markus Gross,
Barbara Solenthaler
Abstract:
Active soft bodies can affect their shape through an internal actuation mechanism that induces a deformation. Similar to recent work, this paper utilizes a differentiable, quasi-static, and physics-based simulation layer to optimize for actuation signals parameterized by neural networks. Our key contribution is a general and implicit formulation to control active soft bodies by defining a function…
▽ More
Active soft bodies can affect their shape through an internal actuation mechanism that induces a deformation. Similar to recent work, this paper utilizes a differentiable, quasi-static, and physics-based simulation layer to optimize for actuation signals parameterized by neural networks. Our key contribution is a general and implicit formulation to control active soft bodies by defining a function that enables a continuous map** from a spatial point in the material space to the actuation value. This property allows us to capture the signal's dominant frequencies, making the method discretization agnostic and widely applicable. We extend our implicit model to mandible kinematics for the particular case of facial animation and show that we can reliably reproduce facial expressions captured with high-quality capture systems. We apply the method to volumetric soft bodies, human poses, and facial expressions, demonstrating artist-friendly properties, such as simple control over the latent space and resolution invariance at test time.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Artist-Friendly Relightable and Animatable Neural Heads
Authors:
Yingyan Xu,
Prashanth Chandran,
Sebastian Weiss,
Markus Gross,
Gaspard Zoss,
Derek Bradley
Abstract:
An increasingly common approach for creating photo-realistic digital avatars is through the use of volumetric neural fields. The original neural radiance field (NeRF) allowed for impressive novel view synthesis of static heads when trained on a set of multi-view images, and follow up methods showed that these neural representations can be extended to dynamic avatars. Recently, new variants also su…
▽ More
An increasingly common approach for creating photo-realistic digital avatars is through the use of volumetric neural fields. The original neural radiance field (NeRF) allowed for impressive novel view synthesis of static heads when trained on a set of multi-view images, and follow up methods showed that these neural representations can be extended to dynamic avatars. Recently, new variants also surpassed the usual drawback of baked-in illumination in neural representations, showing that static neural avatars can be relit in any environment. In this work we simultaneously tackle both the motion and illumination problem, proposing a new method for relightable and animatable neural heads. Our method builds on a proven dynamic avatar approach based on a mixture of volumetric primitives, combined with a recently-proposed lightweight hardware setup for relightable neural fields, and includes a novel architecture that allows relighting dynamic neural avatars performing unseen expressions in any environment, even with nearfield illumination and viewpoints.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Spatially Adaptive Cloth Regression with Implicit Neural Representations
Authors:
Lei Shu,
Vinicius Azevedo,
Barbara Solenthaler,
Markus Gross
Abstract:
The accurate representation of fine-detailed cloth wrinkles poses significant challenges in computer graphics. The inherently non-uniform structure of cloth wrinkles mandates the employment of intricate discretization strategies, which are frequently characterized by high computational demands and complex methodologies. Addressing this, the research introduced in this paper elucidates a novel anis…
▽ More
The accurate representation of fine-detailed cloth wrinkles poses significant challenges in computer graphics. The inherently non-uniform structure of cloth wrinkles mandates the employment of intricate discretization strategies, which are frequently characterized by high computational demands and complex methodologies. Addressing this, the research introduced in this paper elucidates a novel anisotropic cloth regression technique that capitalizes on the potential of implicit neural representations of surfaces. Our first core contribution is an innovative mesh-free sampling approach, crafted to reduce the reliance on traditional mesh structures, thereby offering greater flexibility and accuracy in capturing fine cloth details. Our second contribution is a novel adversarial training scheme, which is designed meticulously to strike a harmonious balance between the sampling and simulation objectives. The adversarial approach ensures that the wrinkles are represented with high fidelity, while also maintaining computational efficiency. Our results showcase through various cloth-object interaction scenarios that our method, given the same memory constraints, consistently surpasses traditional discrete representations, particularly when modelling highly-detailed localized wrinkles.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Weight fluctuations in (deep) linear neural networks and a derivation of the inverse-variance flatness relation
Authors:
Markus Gross,
Arne P. Raulf,
Christoph Räth
Abstract:
We investigate the stationary (late-time) training regime of single- and two-layer underparameterized linear neural networks within the continuum limit of stochastic gradient descent (SGD) for synthetic Gaussian data. In the case of a single-layer network in the weakly underparameterized regime, the spectrum of the noise covariance matrix deviates notably from the Hessian, which can be attributed…
▽ More
We investigate the stationary (late-time) training regime of single- and two-layer underparameterized linear neural networks within the continuum limit of stochastic gradient descent (SGD) for synthetic Gaussian data. In the case of a single-layer network in the weakly underparameterized regime, the spectrum of the noise covariance matrix deviates notably from the Hessian, which can be attributed to the broken detailed balance of SGD dynamics. The weight fluctuations are in this case generally anisotropic, but effectively experience an isotropic loss. For an underparameterized two-layer network, we describe the stochastic dynamics of the weights in each layer and analyze the associated stationary covariances. We identify the inter-layer coupling as a distinct source of anisotropy for the weight fluctuations. In contrast to the single-layer case, the weight fluctuations are effectively subject to an anisotropic loss, the flatness of which is inversely related to the fluctuation variance. We thereby provide an analytical derivation of the recently observed inverse variance-flatness relation in a model of a deep linear neural network.
△ Less
Submitted 23 June, 2024; v1 submitted 23 November, 2023;
originally announced November 2023.
-
GroomGen: A High-Quality Generative Hair Model Using Hierarchical Latent Representations
Authors:
Yuxiao Zhou,
Menglei Chai,
Alessandro Pepe,
Markus Gross,
Thabo Beeler
Abstract:
Despite recent successes in hair acquisition that fits a high-dimensional hair model to a specific input subject, generative hair models, which establish general embedding spaces for encoding, editing, and sampling diverse hairstyles, are way less explored. In this paper, we present GroomGen, the first generative model designed for hair geometry composed of highly-detailed dense strands. Our appro…
▽ More
Despite recent successes in hair acquisition that fits a high-dimensional hair model to a specific input subject, generative hair models, which establish general embedding spaces for encoding, editing, and sampling diverse hairstyles, are way less explored. In this paper, we present GroomGen, the first generative model designed for hair geometry composed of highly-detailed dense strands. Our approach is motivated by two key ideas. First, we construct hair latent spaces covering both individual strands and hairstyles. The latent spaces are compact, expressive, and well-constrained for high-quality and diverse sampling. Second, we adopt a hierarchical hair representation that parameterizes a complete hair model to three levels: single strands, sparse guide hairs, and complete dense hairs. This representation is critical to the compactness of latent spaces, the robustness of training, and the efficiency of inference. Based on this hierarchical latent representation, our proposed pipeline consists of a strand-VAE and a hairstyle-VAE that encode an individual strand and a set of guide hairs to their respective latent spaces, and a hybrid densification step that populates sparse guide hairs to a dense hair model. GroomGen not only enables novel hairstyle sampling and plausible hairstyle interpolation, but also supports interactive editing of complex hairstyles, or can serve as strong data-driven prior for hairstyle reconstruction from images. We demonstrate the superiority of our approach with qualitative examples of diverse sampled hairstyles and quantitative evaluation of generation quality regarding every single component and the entire pipeline.
△ Less
Submitted 16 November, 2023; v1 submitted 3 November, 2023;
originally announced November 2023.
-
A Perceptual Shape Loss for Monocular 3D Face Reconstruction
Authors:
Christopher Otto,
Prashanth Chandran,
Gaspard Zoss,
Markus Gross,
Paulo Gotardo,
Derek Bradley
Abstract:
Monocular 3D face reconstruction is a wide-spread topic, and existing approaches tackle the problem either through fast neural network inference or offline iterative reconstruction of face geometry. In either case carefully-designed energy functions are minimized, commonly including loss terms like a photometric loss, a landmark reprojection loss, and others. In this work we propose a new loss fun…
▽ More
Monocular 3D face reconstruction is a wide-spread topic, and existing approaches tackle the problem either through fast neural network inference or offline iterative reconstruction of face geometry. In either case carefully-designed energy functions are minimized, commonly including loss terms like a photometric loss, a landmark reprojection loss, and others. In this work we propose a new loss function for monocular face capture, inspired by how humans would perceive the quality of a 3D face reconstruction given a particular image. It is widely known that shading provides a strong indicator for 3D shape in the human visual system. As such, our new 'perceptual' shape loss aims to judge the quality of a 3D face estimate using only shading cues. Our loss is implemented as a discriminator-style neural network that takes an input face image and a shaded render of the geometry estimate, and then predicts a score that perceptually evaluates how well the shaded render matches the given image. This 'critic' network operates on the RGB image and geometry render alone, without requiring an estimate of the albedo or illumination in the scene. Furthermore, our loss operates entirely in image space and is thus agnostic to mesh topology. We show how our new perceptual shape loss can be combined with traditional energy terms for monocular 3D face optimization and deep neural network regression, improving upon current state-of-the-art results.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Middle-mile optimization for next-day delivery
Authors:
Konstantinos Benidis,
Georgios Paschos,
Martin Gross,
George Iosifidis
Abstract:
We consider an e-commerce retailer operating a supply chain that consists of middle- and last-mile transportation, and study its ability to deliver products stored in warehouses within a day from customer's order time. Successful next-day delivery requires inventory availability and timely truck schedules in the middle-mile and in this paper we assume a fixed inventory position and focus on optimi…
▽ More
We consider an e-commerce retailer operating a supply chain that consists of middle- and last-mile transportation, and study its ability to deliver products stored in warehouses within a day from customer's order time. Successful next-day delivery requires inventory availability and timely truck schedules in the middle-mile and in this paper we assume a fixed inventory position and focus on optimizing the middle-mile. We formulate a novel optimization problem which decides the departure of the last middle-mile truck at each (potential) network connection in order to maximize the number of next-day deliveries. We show that the respective \emph{next-day delivery optimization} is a combinatorial problem that is $NP$-hard to approximate within $(1-1/e)\cdot\texttt{opt}\approx 0.632\cdot\texttt{opt}$, hence every retailer that offers one-day deliveries has to deal with this complexity barrier. We study three variants of the problem motivated by operational constraints that different retailers encounter, and propose solutions schemes tailored to each problem's properties. To that end, we rely on greedy submodular maximization, pipage rounding techniques, and Lagrangian heuristics. The algorithms are scalable, offer optimality gap guarantees, and evaluated in realistic datasets and network scenarios were found to achieve near-optimal results.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
DualStream: Spatially Sharing Selves and Surroundings using Mobile Devices and Augmented Reality
Authors:
Rishi Vanukuru,
Suibi Che-Chuan Weng,
Krithik Ranjan,
Torin Hopkins,
Amy Banic,
Mark D. Gross,
Ellen Yi-Luen Do
Abstract:
In-person human interaction relies on our spatial perception of each other and our surroundings. Current remote communication tools partially address each of these aspects. Video calls convey real user representations but without spatial interactions. Augmented and Virtual Reality (AR/VR) experiences are immersive and spatial but often use virtual environments and characters instead of real-life r…
▽ More
In-person human interaction relies on our spatial perception of each other and our surroundings. Current remote communication tools partially address each of these aspects. Video calls convey real user representations but without spatial interactions. Augmented and Virtual Reality (AR/VR) experiences are immersive and spatial but often use virtual environments and characters instead of real-life representations. Bridging these gaps, we introduce DualStream, a system for synchronous mobile AR remote communication that captures, streams, and displays spatial representations of users and their surroundings. DualStream supports transitions between user and environment representations with different levels of visuospatial fidelity, as well as the creation of persistent shared spaces using environment snapshots. We demonstrate how DualStream can enable spatial communication in real-world contexts, and support the creation of blended spaces for collaboration. A formative evaluation of DualStream revealed that users valued the ability to interact spatially and move between representations, and could see DualStream fitting into their own remote communication practices in the near future. Drawing from these findings, we discuss new opportunities for designing more widely accessible spatial communication tools, centered around the mobile phone.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
Controllable Inversion of Black-Box Face Recognition Models via Diffusion
Authors:
Manuel Kansy,
Anton Raël,
Graziana Mignone,
Jacek Naruniec,
Christopher Schroers,
Markus Gross,
Romann M. Weber
Abstract:
Face recognition models embed a face image into a low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task of inverting the latent space of pre-trained face recognition models without full model access (i.e. black-box setting). A variety of methods have been propose…
▽ More
Face recognition models embed a face image into a low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task of inverting the latent space of pre-trained face recognition models without full model access (i.e. black-box setting). A variety of methods have been proposed in literature for this task, but they have serious shortcomings such as a lack of realistic outputs and strong requirements for the data set and accessibility of the face recognition model. By analyzing the black-box inversion problem, we show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution even without an identity-specific loss. Our method, named identity denoising diffusion probabilistic model (ID3PM), leverages the stochastic nature of the denoising diffusion process to produce high-quality, identity-preserving face images with various backgrounds, lighting, poses, and expressions. We demonstrate state-of-the-art performance in terms of identity preservation and diversity both qualitatively and quantitatively, and our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process.
△ Less
Submitted 30 September, 2023; v1 submitted 22 March, 2023;
originally announced March 2023.
-
Microdosing: Knowledge Distillation for GAN based Compression
Authors:
Leonhard Helminger,
Roberto Azevedo,
Abdelaziz Djelouah,
Markus Gross,
Christopher Schroers
Abstract:
Recently, significant progress has been made in learned image and video compression. In particular the usage of Generative Adversarial Networks has lead to impressive results in the low bit rate regime. However, the model size remains an important issue in current state-of-the-art proposals and existing solutions require significant computation effort on the decoding side. This limits their usage…
▽ More
Recently, significant progress has been made in learned image and video compression. In particular the usage of Generative Adversarial Networks has lead to impressive results in the low bit rate regime. However, the model size remains an important issue in current state-of-the-art proposals and existing solutions require significant computation effort on the decoding side. This limits their usage in realistic scenarios and the extension to video compression. In this paper, we demonstrate how to leverage knowledge distillation to obtain equally capable image decoders at a fraction of the original number of parameters. We investigate several aspects of our solution including sequence specialization with side information for image coding. Finally, we also show how to transfer the obtained benefits into the setting of video compression. Overall, this allows us to reduce the model size by a factor of 20 and to achieve 50% reduction in decoding time.
△ Less
Submitted 7 January, 2022;
originally announced January 2022.
-
Automating Speedrun Routing: Overview and Vision
Authors:
Matthias Groß,
Dietlind Zühlke,
Boris Naujoks
Abstract:
Speedrunning in general means to play a video game fast, i.e. using all means at one's disposal to achieve a given goal in the least amount of time possible. To do so, a speedrun must be planned in advance, or routed, as referred to by the community. This paper focuses on discovering challenges and defining models needed when trying to approach the problem of routing algorithmically. To do so, thi…
▽ More
Speedrunning in general means to play a video game fast, i.e. using all means at one's disposal to achieve a given goal in the least amount of time possible. To do so, a speedrun must be planned in advance, or routed, as referred to by the community. This paper focuses on discovering challenges and defining models needed when trying to approach the problem of routing algorithmically. To do so, this paper is split in two parts. The first part provides an overview of relevant speedrunning literature, extracting vital information and formulating criticism. Important categorizations are pointed out and a nomenclature is built to support professional discussion. The second part of this paper then refers to the actual speedrun routing optimization problem. Different concepts of graph representations are presented and their potential is discussed. Visions both for problem modeling as well as solving are presented and assessed regarding suitability and expected challenges. Finally, a first assessment of the applicability of existing optimization methods to the defined problem is made, including metaheuristics/EA and Deep Learning methods.
△ Less
Submitted 21 April, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
DuctTake: Spatiotemporal Video Compositing
Authors:
Jan Rueegg,
Oliver Wang,
Aljoscha Smolic,
Markus Gross
Abstract:
DuctTake is a system designed to enable practical compositing of multiple takes of a scene into a single video. Current industry solutions are based around object segmentation, a hard problem that requires extensive manual input and cleanup, making compositing an expensive part of the film-making process. Our method instead composites shots together by finding optimal spatiotemporal seams using mo…
▽ More
DuctTake is a system designed to enable practical compositing of multiple takes of a scene into a single video. Current industry solutions are based around object segmentation, a hard problem that requires extensive manual input and cleanup, making compositing an expensive part of the film-making process. Our method instead composites shots together by finding optimal spatiotemporal seams using motion-compensated 3D graph cuts through the video volume. We describe in detail the required components, decisions, and new techniques that together make a usable, interactive tool for compositing HD video, paying special attention to running time and performance of each section. We validate our approach by presenting a wide variety of examples and by comparing result quality and creation time to composites made by professional artists using current state-of-the-art tools.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Blind Image Restoration with Flow Based Priors
Authors:
Leonhard Helminger,
Michael Bernasconi,
Abdelaziz Djelouah,
Markus Gross,
Christopher Schroers
Abstract:
Image restoration has seen great progress in the last years thanks to the advances in deep neural networks. Most of these existing techniques are trained using full supervision with suitable image pairs to tackle a specific degradation. However, in a blind setting with unknown degradations this is not possible and a good prior remains crucial. Recently, neural network based approaches have been pr…
▽ More
Image restoration has seen great progress in the last years thanks to the advances in deep neural networks. Most of these existing techniques are trained using full supervision with suitable image pairs to tackle a specific degradation. However, in a blind setting with unknown degradations this is not possible and a good prior remains crucial. Recently, neural network based approaches have been proposed to model such priors by leveraging either denoising autoencoders or the implicit regularization captured by the neural network structure itself. In contrast to this, we propose using normalizing flows to model the distribution of the target content and to use this as a prior in a maximum a posteriori (MAP) formulation. By expressing the MAP optimization process in the latent space through the learned bijective map**, we are able to obtain solutions through gradient descent. To the best of our knowledge, this is the first work that explores normalizing flows as prior in image enhancement problems. Furthermore, we present experimental results for a number of different degradations on data sets varying in complexity and show competitive results when comparing with the deep image prior approach.
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
Lossy Image Compression with Normalizing Flows
Authors:
Leonhard Helminger,
Abdelaziz Djelouah,
Markus Gross,
Christopher Schroers
Abstract:
Deep learning based image compression has recently witnessed exciting progress and in some cases even managed to surpass transform coding based approaches that have been established and refined over many decades. However, state-of-the-art solutions for deep image compression typically employ autoencoders which map the input to a lower dimensional latent space and thus irreversibly discard informat…
▽ More
Deep learning based image compression has recently witnessed exciting progress and in some cases even managed to surpass transform coding based approaches that have been established and refined over many decades. However, state-of-the-art solutions for deep image compression typically employ autoencoders which map the input to a lower dimensional latent space and thus irreversibly discard information already before quantization. Due to that, they inherently limit the range of quality levels that can be covered. In contrast, traditional approaches in image compression allow for a larger range of quality levels. Interestingly, they employ an invertible transformation before performing the quantization step which explicitly discards information. Inspired by this, we propose a deep image compression method that is able to go from low bit-rates to near lossless quality by leveraging normalizing flows to learn a bijective map** from the image space to a latent representation. In addition to this, we demonstrate further advantages unique to our solution, such as the ability to maintain constant quality results through re-encoding, even when performed multiple times. To the best of our knowledge, this is the first work to explore the opportunities for leveraging normalizing flows for lossy image compression.
△ Less
Submitted 24 August, 2020;
originally announced August 2020.
-
RoomShift: Room-scale Dynamic Haptics for VR with Furniture-moving Swarm Robots
Authors:
Ryo Suzuki,
Hooman Hedayati,
Clement Zheng,
James Bohn,
Daniel Szafir,
Ellen Yi-Luen Do,
Mark D. Gross,
Daniel Leithinger
Abstract:
RoomShift is a room-scale dynamic haptic environment for virtual reality, using a small swarm of robots that can move furniture. RoomShift consists of nine shape-changing robots: Roombas with mechanical scissor lifts. These robots drive beneath a piece of furniture to lift, move and place it. By augmenting virtual scenes with physical objects, users can sit on, lean against, place and otherwise in…
▽ More
RoomShift is a room-scale dynamic haptic environment for virtual reality, using a small swarm of robots that can move furniture. RoomShift consists of nine shape-changing robots: Roombas with mechanical scissor lifts. These robots drive beneath a piece of furniture to lift, move and place it. By augmenting virtual scenes with physical objects, users can sit on, lean against, place and otherwise interact with furniture with their whole body; just as in the real world. When the virtual scene changes or users navigate within it, the swarm of robots dynamically reconfigures the physical environment to match the virtual content. We describe the hardware and software implementation, applications in virtual tours and architectural design and interaction techniques.
△ Less
Submitted 19 August, 2020;
originally announced August 2020.
-
Enriching Video Captions With Contextual Text
Authors:
Philipp Rimle,
Pelin Dogan,
Markus Gross
Abstract:
Understanding video content and generating caption with context is an important and challenging task. Unlike prior methods that typically attempt to generate generic video captions without context, our architecture contextualizes captioning by infusing extracted information from relevant text data. We propose an end-to-end sequence-to-sequence model which generates video captions based on visual i…
▽ More
Understanding video content and generating caption with context is an important and challenging task. Unlike prior methods that typically attempt to generate generic video captions without context, our architecture contextualizes captioning by infusing extracted information from relevant text data. We propose an end-to-end sequence-to-sequence model which generates video captions based on visual input, and mines relevant knowledge such as names and locations from contextual text. In contrast to previous approaches, we do not preprocess the text further, and let the model directly learn to attend over it. Guided by the visual input, the model is able to copy words from the contextual text via a pointer-generator network, allowing to produce more specific video captions. We show competitive performance on the News Video Dataset and, through ablation studies, validate the efficacy of contextual video captioning as well as individual design choices in our model architecture.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
Shapley Value as Principled Metric for Structured Network Pruning
Authors:
Marco Ancona,
Cengiz Öztireli,
Markus Gross
Abstract:
Structured pruning is a well-known technique to reduce the storage size and inference cost of neural networks. The usual pruning pipeline consists of ranking the network internal filters and activations with respect to their contributions to the network performance, removing the units with the lowest contribution, and fine-tuning the network to reduce the harm induced by pruning. Recent results sh…
▽ More
Structured pruning is a well-known technique to reduce the storage size and inference cost of neural networks. The usual pruning pipeline consists of ranking the network internal filters and activations with respect to their contributions to the network performance, removing the units with the lowest contribution, and fine-tuning the network to reduce the harm induced by pruning. Recent results showed that random pruning performs on par with other metrics, given enough fine-tuning resources. In this work, we show that this is not true on a low-data regime when fine-tuning is either not possible or not effective. In this case, reducing the harm caused by pruning becomes crucial to retain the performance of the network. First, we analyze the problem of estimating the contribution of hidden units with tools suggested by cooperative game theory and propose Shapley values as a principled ranking metric for this task. We compare with several alternatives proposed in the literature and discuss how Shapley values are theoretically preferable. Finally, we compare all ranking metrics on the challenging scenario of low-data pruning, where we demonstrate how Shapley values outperform other heuristics.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
Lagrangian Neural Style Transfer for Fluids
Authors:
Byungsoo Kim,
Vinicius C. Azevedo,
Markus Gross,
Barbara Solenthaler
Abstract:
Artistically controlling the shape, motion and appearance of fluid simulations pose major challenges in visual effects production. In this paper, we present a neural style transfer approach from images to 3D fluids formulated in a Lagrangian viewpoint. Using particles for style transfer has unique benefits compared to grid-based techniques. Attributes are stored on the particles and hence are triv…
▽ More
Artistically controlling the shape, motion and appearance of fluid simulations pose major challenges in visual effects production. In this paper, we present a neural style transfer approach from images to 3D fluids formulated in a Lagrangian viewpoint. Using particles for style transfer has unique benefits compared to grid-based techniques. Attributes are stored on the particles and hence are trivially transported by the particle motion. This intrinsically ensures temporal consistency of the optimized stylized structure and notably improves the resulting quality. Simultaneously, the expensive, recursive alignment of stylization velocity fields of grid approaches is unnecessary, reducing the computation time to less than an hour and rendering neural flow stylization practical in production settings. Moreover, the Lagrangian representation improves artistic control as it allows for multi-fluid stylization and consistent color transfer from images, and the generality of the method enables stylization of smoke and liquids likewise.
△ Less
Submitted 2 May, 2020;
originally announced May 2020.
-
LiftTiles: Constructive Building Blocks for Prototy** Room-scale Shape-changing Interfaces
Authors:
Ryo Suzuki,
Ryosuke Nakayama,
Dan Liu,
Yasuaki Kakehi,
Mark D. Gross,
Daniel Leithinger
Abstract:
Large-scale shape-changing interfaces have great potential, but creating such systems requires substantial time, cost, space, and efforts, which hinders the research community to explore interactions beyond the scale of human hands. We introduce modular inflatable actuators as building blocks for prototy** room-scale shape-changing interfaces. Each actuator can change its height from 15cm to 150…
▽ More
Large-scale shape-changing interfaces have great potential, but creating such systems requires substantial time, cost, space, and efforts, which hinders the research community to explore interactions beyond the scale of human hands. We introduce modular inflatable actuators as building blocks for prototy** room-scale shape-changing interfaces. Each actuator can change its height from 15cm to 150cm, actuated and controlled by air pressure. Each unit is low-cost (8 USD), lightweight (10 kg), compact (15 cm), and robust, making it well-suited for prototy** room-scale shape transformations. Moreover, our modular and reconfigurable design allows researchers and designers to quickly construct different geometries and to explore various applications. This paper contributes to the design and implementation of highly extendable inflatable actuators, and demonstrates a range of scenarios that can leverage this modular building block.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
A blockchain-orchestrated Federated Learning architecture for healthcare consortia
Authors:
Jonathan Passerat-Palmbach,
Tyler Farnan,
Robert Miller,
Marielle S. Gross,
Heather Leigh Flannery,
Bill Gleim
Abstract:
We propose a novel architecture for federated learning within healthcare consortia. At the heart of the solution is a unique integration of privacy preserving technologies, built upon native enterprise blockchain components available in the Ethereum ecosystem. We show how the specific characteristics and challenges of healthcare consortia informed our design choices, notably the conception of a ne…
▽ More
We propose a novel architecture for federated learning within healthcare consortia. At the heart of the solution is a unique integration of privacy preserving technologies, built upon native enterprise blockchain components available in the Ethereum ecosystem. We show how the specific characteristics and challenges of healthcare consortia informed our design choices, notably the conception of a new Secure Aggregation protocol assembled with a protected hardware component and an encryption toolkit native to Ethereum. Our architecture also brings in a privacy preserving audit trail that logs events in the network without revealing identities.
△ Less
Submitted 12 October, 2019;
originally announced October 2019.
-
ShapeBots: Shape-changing Swarm Robots
Authors:
Ryo Suzuki,
Clement Zheng,
Yasuaki Kakehi,
Tom Yeh,
Ellen Yi-Luen Do,
Mark D. Gross,
Daniel Leithinger
Abstract:
We introduce shape-changing swarm robots. A swarm of self-transformable robots can both individually and collectively change their configuration to display information, actuate objects, act as tangible controllers, visualize data, and provide physical affordances. ShapeBots is a concept prototype of shape-changing swarm robots. Each robot can change its shape by leveraging small linear actuators t…
▽ More
We introduce shape-changing swarm robots. A swarm of self-transformable robots can both individually and collectively change their configuration to display information, actuate objects, act as tangible controllers, visualize data, and provide physical affordances. ShapeBots is a concept prototype of shape-changing swarm robots. Each robot can change its shape by leveraging small linear actuators that are thin (2.5 cm) and highly extendable (up to 20cm) in both horizontal and vertical directions. The modular design of each actuator enables various shapes and geometries of self-transformation. We illustrate potential application scenarios and discuss how this type of interface opens up possibilities for the future of ubiquitous and distributed shape-changing interfaces.
△ Less
Submitted 7 September, 2019;
originally announced September 2019.
-
Data-Driven Physical Face Inversion
Authors:
Yeara Kozlov,
Hongyi Xu,
Moritz Bächer,
Derek Bradley,
Markus Gross,
Thabo Beeler
Abstract:
Facial animation is one of the most challenging problems in computer graphics, and it is often solved using linear heuristics like blend-shape rigging. More expressive approaches like physical simulation have emerged, but these methods are very difficult to tune, especially when simulating a real actor's face. We propose to use a simple finite element simulation approach for face animation, and pr…
▽ More
Facial animation is one of the most challenging problems in computer graphics, and it is often solved using linear heuristics like blend-shape rigging. More expressive approaches like physical simulation have emerged, but these methods are very difficult to tune, especially when simulating a real actor's face. We propose to use a simple finite element simulation approach for face animation, and present a novel method for recovering the required simulation parameters in order to best match a real actor's face motion. Our method involves reconstructing a very small number of head poses of the actor in 3D, where the head poses span different configurations of force directions due to gravity. Our algorithm can then automatically recover both the gravity-free rest shape of the face as well as the spatially-varying physical material stiffness such that a forward simulation will match the captured targets as closely as possible. As a result, our system can produce actor-specific, physical parameters that can be immediately used in recent physical simulation methods for faces. Furthermore, as the simulation results depend heavily on the chosen spatial layout of material clusters, we analyze and compare different spatial layouts.
△ Less
Submitted 24 July, 2019;
originally announced July 2019.
-
Transport-Based Neural Style Transfer for Smoke Simulations
Authors:
Byungsoo Kim,
Vinicius C. Azevedo,
Markus Gross,
Barbara Solenthaler
Abstract:
Artistically controlling fluids has always been a challenging task. Optimization techniques rely on approximating simulation states towards target velocity or density field configurations, which are often handcrafted by artists to indirectly control smoke dynamics. Patch synthesis techniques transfer image textures or simulation features to a target flow field. However, these are either limited to…
▽ More
Artistically controlling fluids has always been a challenging task. Optimization techniques rely on approximating simulation states towards target velocity or density field configurations, which are often handcrafted by artists to indirectly control smoke dynamics. Patch synthesis techniques transfer image textures or simulation features to a target flow field. However, these are either limited to adding structural patterns or augmenting coarse flows with turbulent structures, and hence cannot capture the full spectrum of different styles and semantically complex structures. In this paper, we propose the first Transport-based Neural Style Transfer (TNST) algorithm for volumetric smoke data. Our method is able to transfer features from natural images to smoke simulations, enabling general content-aware manipulations ranging from simple patterns to intricate motifs. The proposed algorithm is physically inspired, since it computes the density transport from a source input smoke to a desired target configuration. Our transport-based approach allows direct control over the divergence of the stylization velocity field by optimizing incompressible and irrotational potentials that transport smoke towards stylization. Temporal consistency is ensured by transporting and aligning subsequent stylized velocities, and 3D reconstructions are computed by seamlessly merging stylizations from different camera viewpoints.
△ Less
Submitted 4 September, 2019; v1 submitted 17 May, 2019;
originally announced May 2019.
-
Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds
Authors:
Martin Simon,
Karl Amende,
Andrea Kraus,
Jens Honer,
Timo Sämann,
Hauke Kaulbersch,
Stefan Milz,
Horst Michael Gross
Abstract:
Accurate detection of 3D objects is a fundamental problem in computer vision and has an enormous impact on autonomous cars, augmented/virtual reality and many applications in robotics. In this work we present a novel fusion of neural network based state-of-the-art 3D detector and visual semantic segmentation in the context of autonomous driving. Additionally, we introduce Scale-Rotation-Translatio…
▽ More
Accurate detection of 3D objects is a fundamental problem in computer vision and has an enormous impact on autonomous cars, augmented/virtual reality and many applications in robotics. In this work we present a novel fusion of neural network based state-of-the-art 3D detector and visual semantic segmentation in the context of autonomous driving. Additionally, we introduce Scale-Rotation-Translation score (SRTs), a fast and highly parameterizable evaluation metric for comparison of object detections, which speeds up our inference time up to 20\% and halves training time. On top, we apply state-of-the-art online multi target feature tracking on the object measurements to further increase accuracy and robustness utilizing temporal information. Our experiments on KITTI show that we achieve same results as state-of-the-art in all related categories, while maintaining the performance and accuracy trade-off and still run in real-time. Furthermore, our model is the first one that fuses visual semantic with 3D object detection.
△ Less
Submitted 16 April, 2019;
originally announced April 2019.
-
Generating Animations from Screenplays
Authors:
Yeyao Zhang,
Eleftheria Tsipidi,
Sasha Schriber,
Mubbasir Kapadia,
Markus Gross,
Ashutosh Modi
Abstract:
Automatically generating animation from natural language text finds application in a number of areas e.g. movie script writing, instructional videos, and public safety. However, translating natural language text into animation is a challenging task. Existing text-to-animation systems can handle only very simple sentences, which limits their applications. In this paper, we develop a text-to-animati…
▽ More
Automatically generating animation from natural language text finds application in a number of areas e.g. movie script writing, instructional videos, and public safety. However, translating natural language text into animation is a challenging task. Existing text-to-animation systems can handle only very simple sentences, which limits their applications. In this paper, we develop a text-to-animation system which is capable of handling complex sentences. We achieve this by introducing a text simplification step into the process. Building on an existing animation generation system for screenwriting, we create a robust NLP pipeline to extract information from screenplays and map them to the system's knowledge base. We develop a set of linguistic transformation rules that simplify complex sentences. Information extracted from the simplified sentences is used to generate a rough storyboard and video depicting the text. Our sentence simplification module outperforms existing systems in terms of BLEU and SARI metrics.We further evaluated our system via a user study: 68 % participants believe that our system generates reasonable animation from input screenplays.
△ Less
Submitted 10 April, 2019;
originally announced April 2019.
-
Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation
Authors:
Marco Ancona,
Cengiz Öztireli,
Markus Gross
Abstract:
The problem of explaining the behavior of deep neural networks has recently gained a lot of attention. While several attribution methods have been proposed, most come without strong theoretical foundations, which raises questions about their reliability. On the other hand, the literature on cooperative game theory suggests Shapley values as a unique way of assigning relevance scores such that cert…
▽ More
The problem of explaining the behavior of deep neural networks has recently gained a lot of attention. While several attribution methods have been proposed, most come without strong theoretical foundations, which raises questions about their reliability. On the other hand, the literature on cooperative game theory suggests Shapley values as a unique way of assigning relevance scores such that certain desirable properties are satisfied. Unfortunately, the exact evaluation of Shapley values is prohibitively expensive, exponential in the number of input features. In this work, by leveraging recent results on uncertainty propagation, we propose a novel, polynomial-time approximation of Shapley values in deep neural networks. We show that our method produces significantly better approximations of Shapley values than existing state-of-the-art attribution methods.
△ Less
Submitted 21 June, 2019; v1 submitted 26 March, 2019;
originally announced March 2019.
-
Neural Sequential Phrase Grounding (SeqGROUND)
Authors:
Pelin Dogan,
Leonid Sigal,
Markus Gross
Abstract:
We propose an end-to-end approach for phrase grounding in images. Unlike prior methods that typically attempt to ground each phrase independently by building an image-text embedding, our architecture formulates grounding of multiple phrases as a sequential and contextual process. Specifically, we encode region proposals and all phrases into two stacks of LSTM cells, along with so-far grounded phra…
▽ More
We propose an end-to-end approach for phrase grounding in images. Unlike prior methods that typically attempt to ground each phrase independently by building an image-text embedding, our architecture formulates grounding of multiple phrases as a sequential and contextual process. Specifically, we encode region proposals and all phrases into two stacks of LSTM cells, along with so-far grounded phrase-region pairs. These LSTM stacks collectively capture context for grounding of the next phrase. The resulting architecture, which we call SeqGROUND, supports many-to-many matching by allowing an image region to be matched to multiple phrases and vice versa. We show competitive performance on the Flickr30K benchmark dataset and, through ablation studies, validate the efficacy of sequential grounding as well as individual design choices in our model architecture.
△ Less
Submitted 18 March, 2019;
originally announced March 2019.
-
Disentangled Dynamic Representations from Unordered Data
Authors:
Leonhard Helminger,
Abdelaziz Djelouah,
Markus Gross,
Romann M. Weber
Abstract:
We present a deep generative model that learns disentangled static and dynamic representations of data from unordered input. Our approach exploits regularities in sequential data that exist regardless of the order in which the data is viewed. The result of our factorized graphical model is a well-organized and coherent latent space for data dynamics. We demonstrate our method on several synthetic…
▽ More
We present a deep generative model that learns disentangled static and dynamic representations of data from unordered input. Our approach exploits regularities in sequential data that exist regardless of the order in which the data is viewed. The result of our factorized graphical model is a well-organized and coherent latent space for data dynamics. We demonstrate our method on several synthetic dynamic datasets and real video data featuring various facial expressions and head poses.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
On the cost of essentially fair clusterings
Authors:
Ioana O. Bercea,
Martin Groß,
Samir Khuller,
Aounon Kumar,
Clemens Rösner,
Daniel R. Schmidt,
Melanie Schmidt
Abstract:
Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased.
At NIPS 2017, Chierichetti et al. proposed a m…
▽ More
Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased.
At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair $k$-center problem and a $O(t)$-approximation for the fair $k$-median problem, where $t$ is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair $k$-center.
We extend and improve the known results. Firstly, we give a 5-approximation for the fair $k$-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives $k$-center, $k$-supplier, $k$-median, $k$-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
Tabby: Explorable Design for 3D Printing Textures
Authors:
Ryo Suzuki,
Koji Yatani,
Mark D. Gross,
Tom Yeh
Abstract:
This paper presents Tabby, an interactive and explorable design tool for 3D printing textures. Tabby allows texture design with direct manipulation in the following workflow: 1) select a target surface, 2) sketch and manipulate a texture with 2D drawings, and then 3) generate 3D printing textures onto an arbitrary curved surface. To enable efficient texture creation, Tabby leverages an auto-comple…
▽ More
This paper presents Tabby, an interactive and explorable design tool for 3D printing textures. Tabby allows texture design with direct manipulation in the following workflow: 1) select a target surface, 2) sketch and manipulate a texture with 2D drawings, and then 3) generate 3D printing textures onto an arbitrary curved surface. To enable efficient texture creation, Tabby leverages an auto-completion approach which automates the tedious, repetitive process of applying texture, while allowing flexible customization. Our user evaluation study with seven participants confirms that Tabby can effectively support the design exploration of different patterns for both novice and experienced users.
△ Less
Submitted 30 October, 2018;
originally announced October 2018.
-
Neural Importance Sampling
Authors:
Thomas Müller,
Brian McWilliams,
Fabrice Rousselle,
Markus Gross,
Jan Novák
Abstract:
We propose to use deep neural networks for generating samples in Monte Carlo integration. Our work is based on non-linear independent components estimation (NICE), which we extend in numerous ways to improve performance and enable its application to integration problems. First, we introduce piecewise-polynomial coupling transforms that greatly increase the modeling power of individual coupling lay…
▽ More
We propose to use deep neural networks for generating samples in Monte Carlo integration. Our work is based on non-linear independent components estimation (NICE), which we extend in numerous ways to improve performance and enable its application to integration problems. First, we introduce piecewise-polynomial coupling transforms that greatly increase the modeling power of individual coupling layers. Second, we propose to preprocess the inputs of neural networks using one-blob encoding, which stimulates localization of computation and improves inference. Third, we derive a gradient-descent-based optimization for the KL and the $χ^2$ divergence for the specific application of Monte Carlo integration with unnormalized stochastic estimates of the target distribution. Our approach enables fast and accurate inference and efficient sample generation independently of the dimensionality of the integration domain. We show its benefits on generating natural images and in two applications to light-transport simulation: first, we demonstrate learning of joint path-sampling densities in the primary sample space and importance sampling of multi-dimensional path prefixes thereof. Second, we use our technique to extract conditional directional densities driven by the product of incident illumination and the BSDF in the rendering equation, and we leverage the densities for path guiding. In all applications, our approach yields on-par or higher performance than competing techniques at equal sample count.
△ Less
Submitted 3 September, 2019; v1 submitted 11 August, 2018;
originally announced August 2018.
-
Deep Video Color Propagation
Authors:
Simone Meyer,
Victor Cornillère,
Abdelaziz Djelouah,
Christopher Schroers,
Markus Gross
Abstract:
Traditional approaches for color propagation in videos rely on some form of matching between consecutive video frames. Using appearance descriptors, colors are then propagated both spatially and temporally. These methods, however, are computationally expensive and do not take advantage of semantic information of the scene. In this work we propose a deep learning framework for color propagation tha…
▽ More
Traditional approaches for color propagation in videos rely on some form of matching between consecutive video frames. Using appearance descriptors, colors are then propagated both spatially and temporally. These methods, however, are computationally expensive and do not take advantage of semantic information of the scene. In this work we propose a deep learning framework for color propagation that combines a local strategy, to propagate colors frame-by-frame ensuring temporal stability, and a global strategy, using semantics for color propagation within a longer range. Our evaluation shows the superiority of our strategy over existing video and image color propagation methods as well as neural photo-realistic style transfer approaches.
△ Less
Submitted 9 August, 2018;
originally announced August 2018.
-
Ten Years of Research on Intelligent Educational Games for Learning Spelling and Mathematics
Authors:
Barbara Solenthaler,
Severin Klingler,
Tanja Käser,
Markus Gross
Abstract:
In this article, we present our findings from ten years of research on intelligent educational games. We discuss the architecture of our training environments for learning spelling and mathematics, and specifically focus on the representation of the content and the controller that enables personalized trainings. We first show the multi-modal representation that reroutes information through multipl…
▽ More
In this article, we present our findings from ten years of research on intelligent educational games. We discuss the architecture of our training environments for learning spelling and mathematics, and specifically focus on the representation of the content and the controller that enables personalized trainings. We first show the multi-modal representation that reroutes information through multiple perceptual cues and discuss the game structure. We then present the data-driven student model that is used for a personalized, adaptive presentation of the content. We further leverage machine learning for analytics and visualization tools targeted at teachers and experts. A large data set consisting of training sessions of more than 20,000 children allows statistical interpretations and insights into the nature of learning.
△ Less
Submitted 7 June, 2018;
originally announced June 2018.
-
Deep Fluids: A Generative Network for Parameterized Fluid Simulations
Authors:
Byungsoo Kim,
Vinicius C. Azevedo,
Nils Thuerey,
Theodore Kim,
Markus Gross,
Barbara Solenthaler
Abstract:
This paper presents a novel generative model to synthesize fluid simulations from a set of reduced parameters. A convolutional neural network is trained on a collection of discrete, parameterizable fluid simulation velocity fields. Due to the capability of deep learning architectures to learn representative features of the data, our generative model is able to accurately approximate the training d…
▽ More
This paper presents a novel generative model to synthesize fluid simulations from a set of reduced parameters. A convolutional neural network is trained on a collection of discrete, parameterizable fluid simulation velocity fields. Due to the capability of deep learning architectures to learn representative features of the data, our generative model is able to accurately approximate the training data set, while providing plausible interpolated in-betweens. The proposed generative model is optimized for fluids by a novel loss function that guarantees divergence-free velocity fields at all times. In addition, we demonstrate that we can handle complex parameterizations in reduced spaces, and advance simulations in time by integrating in the latent space with a second network. Our method models a wide variety of fluid behaviors, thus enabling applications such as fast construction of simulations, interpolation of fluids with different parameters, time re-sampling, latent space simulations, and compression of fluid simulation data. Reconstructed velocity fields are generated up to 700x faster than re-simulating the data with the underlying CPU solver, while achieving compression rates of up to 1300x.
△ Less
Submitted 1 February, 2019; v1 submitted 6 June, 2018;
originally announced June 2018.
-
PhaseNet for Video Frame Interpolation
Authors:
Simone Meyer,
Abdelaziz Djelouah,
Brian McWilliams,
Alexander Sorkine-Hornung,
Markus Gross,
Christopher Schroers
Abstract:
Most approaches for video frame interpolation require accurate dense correspondences to synthesize an in-between frame. Therefore, they do not perform well in challenging scenarios with e.g. lighting changes or motion blur. Recent deep learning approaches that rely on kernels to represent motion can only alleviate these problems to some extent. In those cases, methods that use a per-pixel phase-ba…
▽ More
Most approaches for video frame interpolation require accurate dense correspondences to synthesize an in-between frame. Therefore, they do not perform well in challenging scenarios with e.g. lighting changes or motion blur. Recent deep learning approaches that rely on kernels to represent motion can only alleviate these problems to some extent. In those cases, methods that use a per-pixel phase-based motion representation have been shown to work well. However, they are only applicable for a limited amount of motion. We propose a new approach, PhaseNet, that is designed to robustly handle challenging scenarios while also co** with larger motion. Our approach consists of a neural network decoder that directly estimates the phase decomposition of the intermediate frame. We show that this is superior to the hand-crafted heuristics previously used in phase-based methods and also compares favorably to recent deep learning based approaches for video frame interpolation on challenging datasets.
△ Less
Submitted 3 April, 2018;
originally announced April 2018.
-
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
Authors:
Pelin Dogan,
Boyang Li,
Leonid Sigal,
Markus Gross
Abstract:
The alignment of heterogeneous sequential data (video to text) is an important and challenging problem. Standard techniques for this task, including Dynamic Time War** (DTW) and Conditional Random Fields (CRFs), suffer from inherent drawbacks. Mainly, the Markov assumption implies that, given the immediate past, future alignment decisions are independent of further history. The separation betwee…
▽ More
The alignment of heterogeneous sequential data (video to text) is an important and challenging problem. Standard techniques for this task, including Dynamic Time War** (DTW) and Conditional Random Fields (CRFs), suffer from inherent drawbacks. Mainly, the Markov assumption implies that, given the immediate past, future alignment decisions are independent of further history. The separation between similarity computation and alignment decision also prevents end-to-end training. In this paper, we propose an end-to-end neural architecture where alignment actions are implemented as moving data between stacks of Long Short-term Memory (LSTM) blocks. This flexible architecture supports a large variety of alignment tasks, including one-to-one, one-to-many, skip** unmatched elements, and (with extensions) non-monotonic alignment. Extensive experiments on semi-synthetic and real datasets show that our algorithm outperforms state-of-the-art baselines.
△ Less
Submitted 9 April, 2018; v1 submitted 19 February, 2018;
originally announced March 2018.
-
Towards better understanding of gradient-based attribution methods for Deep Neural Networks
Authors:
Marco Ancona,
Enea Ceolini,
Cengiz Öztireli,
Markus Gross
Abstract:
Understanding the flow of information in Deep Neural Networks (DNNs) is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, there have been only a few attempts to compare them from a theoretical perspective. What is more, no exhaustive empirical comparison has been performed in the past. In this…
▽ More
Understanding the flow of information in Deep Neural Networks (DNNs) is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, there have been only a few attempts to compare them from a theoretical perspective. What is more, no exhaustive empirical comparison has been performed in the past. In this work, we analyze four gradient-based attribution methods and formally prove conditions of equivalence and approximation between them. By reformulating two of these methods, we construct a unified framework which enables a direct comparison, as well as an easier implementation. Finally, we propose a novel evaluation metric, called Sensitivity-n and test the gradient-based attribution methods alongside with a simple perturbation-based attribution method on several datasets in the domains of image and text classification, using various network architectures.
△ Less
Submitted 7 March, 2018; v1 submitted 16 November, 2017;
originally announced November 2017.
-
Deep Scattering: Rendering Atmospheric Clouds with Radiance-Predicting Neural Networks
Authors:
Simon Kallweit,
Thomas Müller,
Brian McWilliams,
Markus Gross,
Jan Novák
Abstract:
We present a technique for efficiently synthesizing images of atmospheric clouds using a combination of Monte Carlo integration and neural networks. The intricacies of Lorenz-Mie scattering and the high albedo of cloud-forming aerosols make rendering of clouds---e.g. the characteristic silverlining and the "whiteness" of the inner body---challenging for methods based solely on Monte Carlo integrat…
▽ More
We present a technique for efficiently synthesizing images of atmospheric clouds using a combination of Monte Carlo integration and neural networks. The intricacies of Lorenz-Mie scattering and the high albedo of cloud-forming aerosols make rendering of clouds---e.g. the characteristic silverlining and the "whiteness" of the inner body---challenging for methods based solely on Monte Carlo integration or diffusion theory. We approach the problem differently. Instead of simulating all light transport during rendering, we pre-learn the spatial and directional distribution of radiant flux from tens of cloud exemplars. To render a new scene, we sample visible points of the cloud and, for each, extract a hierarchical 3D descriptor of the cloud geometry with respect to the shading location and the light source. The descriptor is input to a deep neural network that predicts the radiance function for each shading configuration. We make the key observation that progressively feeding the hierarchical descriptor into the network enhances the network's ability to learn faster and predict with high accuracy while using few coefficients. We also employ a block design with residual connections to further improve performance. A GPU implementation of our method synthesizes images of clouds that are nearly indistinguishable from the reference solution within seconds interactively. Our method thus represents a viable solution for applications such as cloud design and, thanks to its temporal stability, also for high-quality production of animated content.
△ Less
Submitted 15 September, 2017;
originally announced September 2017.
-
On the Complexity of Instationary Gas Flows
Authors:
Martin Groß,
Marc E. Pfetsch,
Martin Skutella
Abstract:
We study a simplistic model of instationary gas flows consisting of a sequence of k stationary gas flows. We present efficiently solvable cases and NP-hardness results, establishing complexity gaps between stationary and instationary gas flows (already for k=2) as well as between instationary gas s-t-flows and instationary gas b-flows.
We study a simplistic model of instationary gas flows consisting of a sequence of k stationary gas flows. We present efficiently solvable cases and NP-hardness results, establishing complexity gaps between stationary and instationary gas flows (already for k=2) as well as between instationary gas s-t-flows and instationary gas b-flows.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
FluxMarker: Enhancing Tactile Graphics with Dynamic Tactile Markers
Authors:
Ryo Suzuki,
Abigale Stangl,
Mark D. Gross,
Tom Yeh
Abstract:
For people with visual impairments, tactile graphics are an important means to learn and explore information. However, raised line tactile graphics created with traditional materials such as embossing are static. While available refreshable displays can dynamically change the content, they are still too expensive for many users, and are limited in size. These factors limit wide-spread adoption and…
▽ More
For people with visual impairments, tactile graphics are an important means to learn and explore information. However, raised line tactile graphics created with traditional materials such as embossing are static. While available refreshable displays can dynamically change the content, they are still too expensive for many users, and are limited in size. These factors limit wide-spread adoption and the representation of large graphics or data sets. In this paper, we present FluxMaker, an inexpensive scalable system that renders dynamic information on top of static tactile graphics with movable tactile markers. These dynamic tactile markers can be easily reconfigured and used to annotate static raised line tactile graphics, including maps, graphs, and diagrams. We developed a hardware prototype that actuates magnetic tactile markers driven by low-cost and scalable electromagnetic coil arrays, which can be fabricated with standard printed circuit board manufacturing. We evaluate our prototype with six participants with visual impairments and found positive results across four application areas: location finding or navigating on tactile maps, data analysis, and physicalization, feature identification for tactile graphics, and drawing support. The user study confirms advantages in application domains such as education and data exploration.
△ Less
Submitted 12 August, 2017;
originally announced August 2017.
-
A Local-Search Algorithm for Steiner Forest
Authors:
Martin Groß,
Anupam Gupta,
Amit Kumar,
Jannik Matuschke,
Daniel R. Schmidt,
Melanie Schmidt,
José Verschae
Abstract:
In the Steiner Forest problem, we are given a graph and a collection of source-sink pairs, and the goal is to find a subgraph of minimum total length such that all pairs are connected. The problem is APX-Hard and can be 2-approximated by, e.g., the elegant primal-dual algorithm of Agrawal, Klein, and Ravi from 1995.
We give a local-search-based constant-factor approximation for the problem. Loca…
▽ More
In the Steiner Forest problem, we are given a graph and a collection of source-sink pairs, and the goal is to find a subgraph of minimum total length such that all pairs are connected. The problem is APX-Hard and can be 2-approximated by, e.g., the elegant primal-dual algorithm of Agrawal, Klein, and Ravi from 1995.
We give a local-search-based constant-factor approximation for the problem. Local search brings in new techniques to an area that has for long not seen any improvements and might be a step towards a combinatorial algorithm for the more general survivable network design problem. Moreover, local search was an essential tool to tackle the dynamic MST/Steiner Tree problem, whereas dynamic Steiner Forest is still wide open.
It is easy to see that any constant factor local search algorithm requires steps that add/drop many edges together. We propose natural local moves which, at each step, either (a) add a shortest path in the current graph and then drop a bunch of inessential edges, or (b) add a set of edges to the current solution. This second type of moves is motivated by the potential function we use to measure progress, combining the cost of the solution with a penalty for each connected component. Our carefully-chosen local moves and potential function work in tandem to eliminate bad local minima that arise when using more traditional local moves.
△ Less
Submitted 10 July, 2017;
originally announced July 2017.
-
General Bounds for Incremental Maximization
Authors:
Aaron Bernstein,
Yann Disser,
Martin Groß
Abstract:
We propose a theoretical framework to capture incremental solutions to cardinality constrained maximization problems. The defining characteristic of our framework is that the cardinality/support of the solution is bounded by a value $k\in\mathbb{N}$ that grows over time, and we allow the solution to be extended one element at a time. We investigate the best-possible competitive ratio of such an in…
▽ More
We propose a theoretical framework to capture incremental solutions to cardinality constrained maximization problems. The defining characteristic of our framework is that the cardinality/support of the solution is bounded by a value $k\in\mathbb{N}$ that grows over time, and we allow the solution to be extended one element at a time. We investigate the best-possible competitive ratio of such an incremental solution, i.e., the worst ratio over all $k$ between the incremental solution after $k$ steps and an optimum solution of cardinality $k$. We define a large class of problems that contains many important cardinality constrained maximization problems like maximum matching, knapsack, and packing/covering problems. We provide a general $2.618$-competitive incremental algorithm for this class of problems, and show that no algorithm can have competitive ratio below $2.18$ in general.
In the second part of the paper, we focus on the inherently incremental greedy algorithm that increases the objective value as much as possible in each step. This algorithm is known to be $1.58$-competitive for submodular objective functions, but it has unbounded competitive ratio for the class of incremental problems mentioned above. We define a relaxed submodularity condition for the objective function, capturing problems like maximum (weighted) ($b$-)matching and a variant of the maximum flow problem. We show that the greedy algorithm has competitive ratio (exactly) $2.313$ for the class of problems that satisfy this relaxed submodularity condition.
Note that our upper bounds on the competitive ratios translate to approximation ratios for the underlying cardinality constrained problems.
△ Less
Submitted 17 April, 2018; v1 submitted 29 May, 2017;
originally announced May 2017.
-
Autocomplete Textures for 3D Printing
Authors:
Ryo Suzuki,
Tom Yeh,
Koji Yatani,
Mark D. Gross
Abstract:
Texture is an essential property of physical objects that affects aesthetics, usability, and functionality. However, designing and applying textures to 3D objects with existing tools remains difficult and time-consuming; it requires proficient 3D modeling skills. To address this, we investigated an auto-completion approach for efficient texture creation that automates the tedious, repetitive proce…
▽ More
Texture is an essential property of physical objects that affects aesthetics, usability, and functionality. However, designing and applying textures to 3D objects with existing tools remains difficult and time-consuming; it requires proficient 3D modeling skills. To address this, we investigated an auto-completion approach for efficient texture creation that automates the tedious, repetitive process of applying texture while allowing flexible customization. We developed techniques for users to select a target surface, sketch and manipulate a texture with 2D drawings, and then generate 3D printable textures onto an arbitrary curved surface. In a controlled experiment our tool sped texture creation by 80% over conventional tools, a performance gain that is higher with more complex target surfaces. This result confirms that auto-completion is powerful for creating 3D textures.
△ Less
Submitted 16 March, 2017;
originally announced March 2017.
-
A $\frac{3}{2}$-Approximation Algorithm for Tree Augmentation via Chvátal-Gomory Cuts
Authors:
Samuel Fiorini,
Martin Groß,
Jochen Könemann,
Laura Sanità
Abstract:
The weighted tree augmentation problem (WTAP) is a fundamental network design problem. We are given an undirected tree $G = (V,E)$, an additional set of edges $L$ called links and a cost vector $c \in \mathbb{R}^L_{\geq 1}$. The goal is to choose a minimum cost subset $S \subseteq L$ such that $G = (V, E \cup S)$ is $2$-edge-connected. In the unweighted case, that is, when we have $c_\ell = 1$ for…
▽ More
The weighted tree augmentation problem (WTAP) is a fundamental network design problem. We are given an undirected tree $G = (V,E)$, an additional set of edges $L$ called links and a cost vector $c \in \mathbb{R}^L_{\geq 1}$. The goal is to choose a minimum cost subset $S \subseteq L$ such that $G = (V, E \cup S)$ is $2$-edge-connected. In the unweighted case, that is, when we have $c_\ell = 1$ for all $\ell \in L$, the problem is called the tree augmentation problem (TAP).
Both problems are known to be APX-hard, and the best known approximation factors are $2$ for WTAP by (Frederickson and JáJá, '81) and $\tfrac{3}{2}$ for TAP due to (Kortsarz and Nutov, TALG '16). In the case where all link costs are bounded by a constant $M$, (Adjiashvili, SODA '17) recently gave a $\approx 1.96418+\varepsilon$-approximation algorithm for WTAP under this assumption. This is the first approximation with a better guarantee than $2$ that does not require restrictions on the structure of the tree or the links.
In this paper, we improve Adjiashvili's approximation to a $\frac{3}{2}+\varepsilon$-approximation for WTAP under the bounded cost assumption. We achieve this by introducing a strong LP that combines $\{0,\frac{1}{2}\}$-Chvátal-Gomory cuts for the standard LP for the problem with bundle constraints from Adjiashvili. We show that our LP can be solved efficiently and that it is exact for some instances that arise at the core of Adjiashvili's approach. This results in the improved guarantee of $\frac{3}{2}+\varepsilon$. For TAP, this is the best known LP-based result, and matches the bound of $\frac{3}{2}+\varepsilon$ achieved by the best SDP-based algorithm due to (Cheriyan and Gao, arXiv '15).
△ Less
Submitted 23 February, 2017; v1 submitted 17 February, 2017;
originally announced February 2017.
-
Scheduling Maintenance Jobs in Networks
Authors:
Fidaa Abed,
Lin Chen,
Yann Disser,
Martin Groß,
Nicole Megow,
Julie Meißner,
Alexander T. Richter,
Roman Rischke
Abstract:
We investigate the problem of scheduling the maintenance of edges in a network, motivated by the goal of minimizing outages in transportation or telecommunication networks. We focus on maintaining connectivity between two nodes over time; for the special case of path networks, this is related to the problem of minimizing the busy time of machines.
We show that the problem can be solved in polyno…
▽ More
We investigate the problem of scheduling the maintenance of edges in a network, motivated by the goal of minimizing outages in transportation or telecommunication networks. We focus on maintaining connectivity between two nodes over time; for the special case of path networks, this is related to the problem of minimizing the busy time of machines.
We show that the problem can be solved in polynomial time in arbitrary networks if preemption is allowed. If preemption is restricted to integral time points, the problem is NP-hard and in the non-preemptive case we give strong non-approximability results. Furthermore, we give tight bounds on the power of preemption, that is, the maximum ratio of the values of non-preemptive and preemptive optimal solutions.
Interestingly, the preemptive and the non-preemptive problem can be solved efficiently on paths, whereas we show that mixing both leads to a weakly NP-hard problem that allows for a simple 2-approximation.
△ Less
Submitted 30 January, 2017;
originally announced January 2017.