-
End-to-End Optimized Pipeline for Prediction of Protein Folding Kinetics
Authors:
Vijay Arvind. R,
Haribharathi Sivakumar,
Brindha. R
Abstract:
Protein folding is the intricate process by which a linear sequence of amino acids self-assembles into a unique three-dimensional structure. Protein folding kinetics is the study of pathways and time-dependent mechanisms a protein undergoes when it folds. Understanding protein kinetics is essential as a protein needs to fold correctly for it to perform its biological functions optimally, and a mis…
▽ More
Protein folding is the intricate process by which a linear sequence of amino acids self-assembles into a unique three-dimensional structure. Protein folding kinetics is the study of pathways and time-dependent mechanisms a protein undergoes when it folds. Understanding protein kinetics is essential as a protein needs to fold correctly for it to perform its biological functions optimally, and a misfolded protein can sometimes be contorted into shapes that are not ideal for a cellular environment giving rise to many degenerative, neuro-degenerative disorders and amyloid diseases. Monitoring at-risk individuals and detecting protein discrepancies in a protein's folding kinetics at the early stages could majorly result in public health benefits, as preventive measures can be taken. This research proposes an efficient pipeline for predicting protein folding kinetics with high accuracy and low memory footprint. The deployed machine learning (ML) model outperformed the state-of-the-art ML models by 4.8% in terms of accuracy while consuming 327x lesser memory and being 7.3% faster.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars
Authors:
Mohit Mendiratta,
Xingang Pan,
Mohamed Elgharib,
Kartik Teotia,
Mallikarjun B R,
Ayush Tewari,
Vladislav Golyanik,
Adam Kortylewski,
Christian Theobalt
Abstract:
Capturing and editing full head performances enables the creation of virtual characters with various applications such as extended reality and media production. The past few years witnessed a steep rise in the photorealism of human head avatars. Such avatars can be controlled through different input data modalities, including RGB, audio, depth, IMUs and others. While these data modalities provide…
▽ More
Capturing and editing full head performances enables the creation of virtual characters with various applications such as extended reality and media production. The past few years witnessed a steep rise in the photorealism of human head avatars. Such avatars can be controlled through different input data modalities, including RGB, audio, depth, IMUs and others. While these data modalities provide effective means of control, they mostly focus on editing the head movements such as the facial expressions, head pose and/or camera viewpoint. In this paper, we propose AvatarStudio, a text-based method for editing the appearance of a dynamic full head avatar. Our approach builds on existing work to capture dynamic performances of human heads using neural radiance field (NeRF) and edits this representation with a text-to-image diffusion model. Specifically, we introduce an optimization strategy for incorporating multiple keyframes representing different camera viewpoints and time stamps of a video performance into a single diffusion model. Using this personalized diffusion model, we edit the dynamic NeRF by introducing view-and-time-aware Score Distillation Sampling (VT-SDS) following a model-based guidance approach. Our method edits the full head in a canonical space, and then propagates these edits to remaining time steps via a pretrained deformation network. We evaluate our method visually and numerically via a user study, and results show that our method outperforms existing approaches. Our experiments validate the design choices of our method and highlight that our edits are genuine, personalized, as well as 3D- and time-consistent.
△ Less
Submitted 2 June, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
A Dynamic Heterogeneous Team-based Non-iterative Approach for Online Pick-up and Just-In-Time Delivery Problems
Authors:
Shridhar Velhal,
Srikrishna B R,
Mukunda Bharatheesha,
Suresh Sundaram
Abstract:
This paper presents a non-iterative approach for finding the assignment of heterogeneous robots to efficiently execute online Pickup and Just-In-Time Delivery (PJITD) tasks with optimal resource utilization. The PJITD assignments problem is formulated as a spatio-temporal multi-task assignment (STMTA) problem. The physical constraints on the map and vehicle dynamics are incorporated in the cost fo…
▽ More
This paper presents a non-iterative approach for finding the assignment of heterogeneous robots to efficiently execute online Pickup and Just-In-Time Delivery (PJITD) tasks with optimal resource utilization. The PJITD assignments problem is formulated as a spatio-temporal multi-task assignment (STMTA) problem. The physical constraints on the map and vehicle dynamics are incorporated in the cost formulation. The linear sum assignment problem is formulated for the heterogeneous STMTA problem. The recently proposed Dynamic Resource Allocation with Multi-task assignments (DREAM) approach has been modified to solve the heterogeneous PJITD problem. At the start, it computes the minimum number of robots required (with their types) to execute given heterogeneous PJITD tasks. These required robots are added to the team to guarantee the feasibility of all PJITD tasks. Then robots in an updated team are assigned to execute the PJITD tasks while minimizing the total cost for the team to execute all PJITD tasks. The performance of the proposed non-iterative approach has been validated using high-fidelity software-in-loop simulations and hardware experiments. The simulations and experimental results clearly indicate that the proposed approach is scalable and provides optimal resource utilization.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
GVP: Generative Volumetric Primitives
Authors:
Mallikarjun B R,
Xingang Pan,
Mohamed Elgharib,
Christian Theobalt
Abstract:
Advances in 3D-aware generative models have pushed the boundary of image synthesis with explicit camera control. To achieve high-resolution image synthesis, several attempts have been made to design efficient generators, such as hybrid architectures with both 3D and 2D components. However, such a design compromises multiview consistency, and the design of a pure 3D generator with high resolution i…
▽ More
Advances in 3D-aware generative models have pushed the boundary of image synthesis with explicit camera control. To achieve high-resolution image synthesis, several attempts have been made to design efficient generators, such as hybrid architectures with both 3D and 2D components. However, such a design compromises multiview consistency, and the design of a pure 3D generator with high resolution is still an open problem. In this work, we present Generative Volumetric Primitives (GVP), the first pure 3D generative model that can sample and render 512-resolution images in real-time. GVP jointly models a number of volumetric primitives and their spatial information, both of which can be efficiently generated via a 2D convolutional network. The mixture of these primitives naturally captures the sparsity and correspondence in the 3D volume. The training of such a generator with a high degree of freedom is made possible through a knowledge distillation technique. Experiments on several datasets demonstrate superior efficiency and 3D consistency of GVP over the state-of-the-art.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
HQ3DAvatar: High Quality Controllable 3D Head Avatar
Authors:
Kartik Teotia,
Mallikarjun B R,
Xingang Pan,
Hyeongwoo Kim,
Pablo Garrido,
Mohamed Elgharib,
Christian Theobalt
Abstract:
Multi-view volumetric rendering techniques have recently shown great potential in modeling and synthesizing high-quality head avatars. A common approach to capture full head dynamic performances is to track the underlying geometry using a mesh-based template or 3D cube-based graphics primitives. While these model-based approaches achieve promising results, they often fail to learn complex geometri…
▽ More
Multi-view volumetric rendering techniques have recently shown great potential in modeling and synthesizing high-quality head avatars. A common approach to capture full head dynamic performances is to track the underlying geometry using a mesh-based template or 3D cube-based graphics primitives. While these model-based approaches achieve promising results, they often fail to learn complex geometric details such as the mouth interior, hair, and topological changes over time. This paper presents a novel approach to building highly photorealistic digital head avatars. Our method learns a canonical space via an implicit function parameterized by a neural network. It leverages multiresolution hash encoding in the learned feature space, allowing for high-quality, faster training and high-resolution rendering. At test time, our method is driven by a monocular RGB video. Here, an image encoder extracts face-specific features that also condition the learnable canonical space. This encourages deformation-dependent texture variations during training. We also propose a novel optical flow based loss that ensures correspondences in the learned canonical space, thus encouraging artifact-free and temporally consistent renderings. We show results on challenging facial expressions and show free-viewpoint renderings at interactive real-time rates for medium image resolutions. Our method outperforms all existing approaches, both visually and numerically. We will release our multiple-identity dataset to encourage further research. Our Project page is available at: https://vcai.mpi-inf.mpg.de/projects/HQ3DAvatar/
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
LiveHand: Real-time and Photorealistic Neural Hand Rendering
Authors:
Akshay Mundra,
Mallikarjun B R,
Jiayi Wang,
Marc Habermann,
Christian Theobalt,
Mohamed Elgharib
Abstract:
The human hand is the main medium through which we interact with our surroundings, making its digitization an important problem. While there are several works modeling the geometry of hands, little attention has been paid to capturing photo-realistic appearance. Moreover, for applications in extended reality and gaming, real-time rendering is critical. We present the first neural-implicit approach…
▽ More
The human hand is the main medium through which we interact with our surroundings, making its digitization an important problem. While there are several works modeling the geometry of hands, little attention has been paid to capturing photo-realistic appearance. Moreover, for applications in extended reality and gaming, real-time rendering is critical. We present the first neural-implicit approach to photo-realistically render hands in real-time. This is a challenging problem as hands are textured and undergo strong articulations with pose-dependent effects. However, we show that this aim is achievable through our carefully designed method. This includes training on a low-resolution rendering of a neural radiance field, together with a 3D-consistent super-resolution module and mesh-guided sampling and space canonicalization. We demonstrate a novel application of perceptual loss on the image space, which is critical for learning details accurately. We also show a live demo where we photo-realistically render the human hand in real-time for the first time, while also modeling pose- and view-dependent appearance effects. We ablate all our design choices and show that they optimize for rendering speed and quality. Video results and our code can be accessed from https://vcai.mpi-inf.mpg.de/projects/LiveHand/
△ Less
Submitted 20 August, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
Factors that affect Camera based Self-Monitoring of Vitals in the Wild
Authors:
Nikhil S. Narayan,
Shashanka B. R.,
Rohit Damodaran,
Dr. Chandrashekhar Jayaram,
Dr. M. A. Kareem,
Dr. Mamta P.,
Dr. Saravanan K. R.,
Dr. Monu Krishnan,
Dr. Raja Indana
Abstract:
The reliability of the results of self monitoring of the vitals in the wild using medical devices or wearables or camera based smart phone solutions is subject to variabilities such as position of placement, hardware of the device and environmental factors. In this first of its kind study, we demonstrate that this variability in self monitoring of Blood Pressure (BP), Blood oxygen saturation level…
▽ More
The reliability of the results of self monitoring of the vitals in the wild using medical devices or wearables or camera based smart phone solutions is subject to variabilities such as position of placement, hardware of the device and environmental factors. In this first of its kind study, we demonstrate that this variability in self monitoring of Blood Pressure (BP), Blood oxygen saturation level (SpO2) and Heart rate (HR) is statistically significant (p<0.05) on 203 healthy subjects by quantifying positional and hardware variability. We also establish the existence of this variability in camera based solutions for self-monitoring of vitals in smart phones and thus prove that the use of camera based smart phone solutions is similar to the use of medical devices or wearables for self-monitoring in the wild.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
When elephants nodded and dolls spoke: Bringing together robotics and storytelling for environmental literacy
Authors:
Mukil M. V.,
Gayathri Manikutty,
Divya Vijayan,
Aparna Rangudu,
Bhavani Rao R
Abstract:
Inculcating principles of environmental stewardship among the children and youth is needed urgently today for creating a sustainable future. This paper presents a model for promoting environment literacy in India using story telling based workshops while focusing on STEM education including computational thinking, robotics and maker skills. During the workshop, participants build a robotic diorama…
▽ More
Inculcating principles of environmental stewardship among the children and youth is needed urgently today for creating a sustainable future. This paper presents a model for promoting environment literacy in India using story telling based workshops while focusing on STEM education including computational thinking, robotics and maker skills. During the workshop, participants build a robotic diorama with digital animations and animatronics to tell their story. Our initial observations from pilot studies conducted in 2019 in six rural and semi-urban schools in India showed us that the children were deeply engaged and enthusiastic throughout the workshop making the entire learning experience a very meaningful and joyful one for all.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
State of the Art in Dense Monocular Non-Rigid 3D Reconstruction
Authors:
Edith Tretschk,
Navami Kairanda,
Mallikarjun B R,
Rishabh Dabral,
Adam Kortylewski,
Bernhard Egger,
Marc Habermann,
Pascal Fua,
Christian Theobalt,
Vladislav Golyanik
Abstract:
3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics. It is an ill-posed inverse problem, since -- without additional prior assumptions -- it permits infinitely many solutions leading to accurate projection to the input 2D images. Non-rigid reconstruction is a foundational…
▽ More
3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics. It is an ill-posed inverse problem, since -- without additional prior assumptions -- it permits infinitely many solutions leading to accurate projection to the input 2D images. Non-rigid reconstruction is a foundational building block for downstream applications like robotics, AR/VR, or visual content creation. The key advantage of using monocular cameras is their omnipresence and availability to the end users as well as their ease of use compared to more sophisticated camera set-ups such as stereo or multi-view systems. This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views. It reviews the fundamentals of 3D reconstruction and deformation modeling from 2D image observations. We then start from general methods -- that handle arbitrary scenes and make only a few prior assumptions -- and proceed towards techniques making stronger assumptions about the observed objects and types of deformations (e.g. human faces, bodies, hands, and animals). A significant part of this STAR is also devoted to classification and a high-level comparison of the methods, as well as an overview of the datasets for training and evaluation of the discussed techniques. We conclude by discussing open challenges in the field and the social aspects associated with the usage of the reviewed methods.
△ Less
Submitted 24 March, 2023; v1 submitted 27 October, 2022;
originally announced October 2022.
-
Designing Interference-Immune Doppler-TolerantWaveforms for Automotive Radar Applications
Authors:
Robin Amar,
Mohammad Alaee-Kerahroodi,
Prabhu Babu,
Bhavani Shankar M. R
Abstract:
Dynamic target detection using FMCW waveform is challenging in the presence of interference for different radar applications. Degradation in SNR is irreparable and interference is difficult to mitigate in time and frequency domain. In this paper, a waveform design problem is addressed using the Majorization-Minimization (MM) framework by considering PSL/ISL cost functions, resulting in a code sequ…
▽ More
Dynamic target detection using FMCW waveform is challenging in the presence of interference for different radar applications. Degradation in SNR is irreparable and interference is difficult to mitigate in time and frequency domain. In this paper, a waveform design problem is addressed using the Majorization-Minimization (MM) framework by considering PSL/ISL cost functions, resulting in a code sequence with Doppler-tolerance characteristics of an FMCW waveform and interference immune characteristics of a tailored PMCW waveform (unique phase code + minimal ISL/PSL). The optimal design sequences possess polynomial phase behavior of degree Q amongst its sub-sequences and obtain optimal ISL and PSL solutions with guaranteed convergence. By tuning the optimization parameters such as degree Q of the polynomial phase behavior, sub-sequence length M and the total number of sub-sequences L, the optimized sequences can be as Doppler tolerant as FMCW waveform in one end, and they can possess small cross-correlation values similar to random-phase sequences in PMCW waveform on the other end. If required in the event of acute interference, new codes can be generated in the runtime which have low cross-correlation with the interferers. The performance analysis indicates that the proposed method outperforms the state-of-the-art counterparts.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance from Monocular Images
Authors:
Ayush Tewari,
Mallikarjun B R,
Xingang Pan,
Ohad Fried,
Maneesh Agrawala,
Christian Theobalt
Abstract:
Learning 3D generative models from a dataset of monocular images enables self-supervised 3D reasoning and controllable synthesis. State-of-the-art 3D generative models are GANs which use neural 3D volumetric representations for synthesis. Images are synthesized by rendering the volumes from a given camera. These models can disentangle the 3D scene from the camera viewpoint in any generated image.…
▽ More
Learning 3D generative models from a dataset of monocular images enables self-supervised 3D reasoning and controllable synthesis. State-of-the-art 3D generative models are GANs which use neural 3D volumetric representations for synthesis. Images are synthesized by rendering the volumes from a given camera. These models can disentangle the 3D scene from the camera viewpoint in any generated image. However, most models do not disentangle other factors of image formation, such as geometry and appearance. In this paper, we design a 3D GAN which can learn a disentangled model of objects, just from monocular observations. Our model can disentangle the geometry and appearance variations in the scene, i.e., we can independently sample from the geometry and appearance spaces of the generative model. This is achieved using a novel non-rigid deformable scene formulation. A 3D volume which represents an object instance is computed as a non-rigidly deformed canonical 3D volume. Our method learns the canonical volume, as well as its deformations, jointly during training. This formulation also helps us improve the disentanglement between the 3D scene and the camera viewpoints using a novel pose regularization loss defined on the 3D deformation field. In addition, we further model the inverse deformations, enabling the computation of dense correspondences between images generated by our model. Finally, we design an approach to embed real images into the latent space of our disentangled generative model, enabling editing of real images.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
ARC Nav -- A 3D Navigation Stack for Autonomous Robots
Authors:
Vishwas N. S,
Srikrishna B. R,
Sudarshan T. S. B
Abstract:
Popular navigation stacks implemented on top of open-source frameworks such as ROS(Robot Operating System) and ROS2 represent the robot workspace using a discretized 2D occupancy grid. This method, while requiring less computation, restricts the use of such navigation stacks to wheeled robots navigating on flat surfaces. In this paper, we present a navigation stack that uses a volumetric represent…
▽ More
Popular navigation stacks implemented on top of open-source frameworks such as ROS(Robot Operating System) and ROS2 represent the robot workspace using a discretized 2D occupancy grid. This method, while requiring less computation, restricts the use of such navigation stacks to wheeled robots navigating on flat surfaces. In this paper, we present a navigation stack that uses a volumetric representation of the robot workspace, and hence can be extended to aerial and legged robots navigating through uneven terrain. Additionally, we present a new sampling-based motion planning algorithm which introduces a bi-directional approach to the Batch Informed Trees (BIT*) motion planning algorithm, whilst wrap** it with a strategy switching approach in order to reduce the initial time taken to find a path, in addition to the time taken to find the shortest path.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
NASA Space Robotics Challenge 2 Qualification Round: An Approach to Autonomous Lunar Rover Operations
Authors:
Cagri Kilic,
Bernardo Martinez R. Jr.,
Christopher A. Tatsch,
Jared Beard,
Jared Strader,
Shounak Das,
Derek Ross,
Yu Gu,
Guilherme A. S. Pereira,
Jason N. Gross
Abstract:
Plans for establishing a long-term human presence on the Moon will require substantial increases in robot autonomy and multi-robot coordination to support establishing a lunar outpost. To achieve these objectives, algorithm design choices for the software developments need to be tested and validated for expected scenarios such as autonomous in-situ resource utilization (ISRU), localization in chal…
▽ More
Plans for establishing a long-term human presence on the Moon will require substantial increases in robot autonomy and multi-robot coordination to support establishing a lunar outpost. To achieve these objectives, algorithm design choices for the software developments need to be tested and validated for expected scenarios such as autonomous in-situ resource utilization (ISRU), localization in challenging environments, and multi-robot coordination. However, real-world experiments are extremely challenging and limited for extraterrestrial environment. Also, realistic simulation demonstrations in these environments are still rare and demanded for initial algorithm testing capabilities. To help some of these needs, the NASA Centennial Challenges program established the Space Robotics Challenge Phase 2 (SRC2) which consist of virtual robotic systems in a realistic lunar simulation environment, where a group of mobile robots were tasked with reporting volatile locations within a global map, excavating and transporting these resources, and detecting and localizing a target of interest. The main goal of this article is to share our team's experiences on the design trade-offs to perform autonomous robotic operations in a virtual lunar environment and to share strategies to complete the mission requirements posed by NASA SRC2 competition during the qualification round. Of the 114 teams that registered for participation in the NASA SRC2, team Mountaineers finished as one of only six teams to receive the top qualification round prize.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Heterogeneously-Distributed Joint Radar Communications: Bayesian Resource Allocation
Authors:
Linlong Wu,
Kumar Vijay Mishra,
Bhavani Shankar M. R.,
Björn Ottersten
Abstract:
Due to spectrum scarcity, the coexistence of radar and wireless communication has gained substantial research interest recently. Among many scenarios, the heterogeneouslydistributed joint radar-communication system is promising due to its flexibility and compatibility of existing architectures. In this paper, we focus on a heterogeneous radar and communication network (HRCN), which consists of var…
▽ More
Due to spectrum scarcity, the coexistence of radar and wireless communication has gained substantial research interest recently. Among many scenarios, the heterogeneouslydistributed joint radar-communication system is promising due to its flexibility and compatibility of existing architectures. In this paper, we focus on a heterogeneous radar and communication network (HRCN), which consists of various generic radars for multiple target tracking (MTT) and wireless communications for multiple users. We aim to improve the MTT performance and maintain good throughput levels for communication users by a well-designed resource allocation. The problem is formulated as a Bayesian Cramér-Rao bound (CRB) based minimization subjecting to resource budgets and throughput constraints. The formulated nonconvex problem is solved based on an alternating descent-ascent approach. Numerical results demonstrate the efficacy of the proposed allocation scheme for this heterogeneous network.
△ Less
Submitted 4 March, 2022; v1 submitted 29 July, 2021;
originally announced July 2021.
-
Efficient and Differentiable Shadow Computation for Inverse Problems
Authors:
Linjie Lyu,
Marc Habermann,
Lingjie Liu,
Mallikarjun B R,
Ayush Tewari,
Christian Theobalt
Abstract:
Differentiable rendering has received increasing interest for image-based inverse problems. It can benefit traditional optimization-based solutions to inverse problems, but also allows for self-supervision of learning-based approaches for which training data with ground truth annotation is hard to obtain. However, existing differentiable renderers either do not model visibility of the light source…
▽ More
Differentiable rendering has received increasing interest for image-based inverse problems. It can benefit traditional optimization-based solutions to inverse problems, but also allows for self-supervision of learning-based approaches for which training data with ground truth annotation is hard to obtain. However, existing differentiable renderers either do not model visibility of the light sources from the different points in the scene, responsible for shadows in the images, or are too slow for being used to train deep architectures over thousands of iterations. To this end, we propose an accurate yet efficient approach for differentiable visibility and soft shadow computation. Our approach is based on the spherical harmonics approximations of the scene illumination and visibility, where the occluding surface is approximated with spheres. This allows for a significantly more efficient shadow computation compared to methods based on ray tracing. As our formulation is differentiable, it can be used to solve inverse problems such as texture, illumination, rigid pose, and geometric deformation recovery from images using analysis-by-synthesis optimization.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
PhotoApp: Photorealistic Appearance Editing of Head Portraits
Authors:
Mallikarjun B R,
Ayush Tewari,
Abdallah Dib,
Tim Weyrich,
Bernd Bickel,
Hans-Peter Seidel,
Hanspeter Pfister,
Wojciech Matusik,
Louis Chevallier,
Mohamed Elgharib,
Christian Theobalt
Abstract:
Photorealistic editing of portraits is a challenging task as humans are very sensitive to inconsistencies in faces. We present an approach for high-quality intuitive editing of the camera viewpoint and scene illumination in a portrait image. This requires our method to capture and control the full reflectance field of the person in the image. Most editing approaches rely on supervised learning usi…
▽ More
Photorealistic editing of portraits is a challenging task as humans are very sensitive to inconsistencies in faces. We present an approach for high-quality intuitive editing of the camera viewpoint and scene illumination in a portrait image. This requires our method to capture and control the full reflectance field of the person in the image. Most editing approaches rely on supervised learning using training data captured with setups such as light and camera stages. Such datasets are expensive to acquire, not readily available and do not capture all the rich variations of in-the-wild portrait images. In addition, most supervised approaches only focus on relighting, and do not allow camera viewpoint editing. Thus, they only capture and control a subset of the reflectance field. Recently, portrait editing has been demonstrated by operating in the generative model space of StyleGAN. While such approaches do not require direct supervision, there is a significant loss of quality when compared to the supervised approaches. In this paper, we present a method which learns from limited supervised training data. The training images only include people in a fixed neutral expression with eyes closed, without much hair or background variations. Each person is captured under 150 one-light-at-a-time conditions and under 8 camera poses. Instead of training directly in the image space, we design a supervised problem which learns transformations in the latent space of StyleGAN. This combines the best of supervised learning and generative adversarial modeling. We show that the StyleGAN prior allows for generalisation to different expressions, hairstyles and backgrounds. This produces high-quality photorealistic results for in-the-wild images and significantly outperforms existing methods. Our approach can edit the illumination and pose simultaneously, and runs at interactive rates.
△ Less
Submitted 13 May, 2021; v1 submitted 13 March, 2021;
originally announced March 2021.
-
Learning Complete 3D Morphable Face Models from Images and Videos
Authors:
Mallikarjun B R,
Ayush Tewari,
Hans-Peter Seidel,
Mohamed Elgharib,
Christian Theobalt
Abstract:
Most 3D face reconstruction methods rely on 3D morphable models, which disentangle the space of facial deformations into identity geometry, expressions and skin reflectance. These models are typically learned from a limited number of 3D scans and thus do not generalize well across different identities and expressions. We present the first approach to learn complete 3D models of face identity geome…
▽ More
Most 3D face reconstruction methods rely on 3D morphable models, which disentangle the space of facial deformations into identity geometry, expressions and skin reflectance. These models are typically learned from a limited number of 3D scans and thus do not generalize well across different identities and expressions. We present the first approach to learn complete 3D models of face identity geometry, albedo and expression just from images and videos. The virtually endless collection of such data, in combination with our self-supervised learning-based approach allows for learning face models that generalize beyond the span of existing approaches. Our network design and loss functions ensure a disentangled parameterization of not only identity and albedo, but also, for the first time, an expression basis. Our method also allows for in-the-wild monocular reconstruction at test time. We show that our learned models better generalize and lead to higher quality image-based reconstructions than existing approaches.
△ Less
Submitted 4 October, 2020;
originally announced October 2020.
-
PIE: Portrait Image Embedding for Semantic Control
Authors:
Ayush Tewari,
Mohamed Elgharib,
Mallikarjun B R.,
Florian Bernard,
Hans-Peter Seidel,
Patrick Pérez,
Michael Zollhöfer,
Christian Theobalt
Abstract:
Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated cont…
▽ More
Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated control parameter. Very recently, high-quality semantically controlled editing has been demonstrated, however only on synthetically created StyleGAN images. We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image. Semantic editing in parameter space is achieved based on StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN. We design a novel hierarchical non-linear optimization problem to obtain the embedding. An identity preservation energy term allows spatially coherent edits while maintaining facial integrity. Our approach runs at interactive frame rates and thus allows the user to explore the space of possible edits. We evaluate our approach on a wide set of portrait photos, compare it to the current state of the art, and validate the effectiveness of its components in an ablation study.
△ Less
Submitted 20 September, 2020;
originally announced September 2020.
-
Monocular Reconstruction of Neural Face Reflectance Fields
Authors:
Mallikarjun B R.,
Ayush Tewari,
Tae-Hyun Oh,
Tim Weyrich,
Bernd Bickel,
Hans-Peter Seidel,
Hanspeter Pfister,
Wojciech Matusik,
Mohamed Elgharib,
Christian Theobalt
Abstract:
The reflectance field of a face describes the reflectance properties responsible for complex lighting effects including diffuse, specular, inter-reflection and self shadowing. Most existing methods for estimating the face reflectance from a monocular image assume faces to be diffuse with very few approaches adding a specular component. This still leaves out important perceptual aspects of reflecta…
▽ More
The reflectance field of a face describes the reflectance properties responsible for complex lighting effects including diffuse, specular, inter-reflection and self shadowing. Most existing methods for estimating the face reflectance from a monocular image assume faces to be diffuse with very few approaches adding a specular component. This still leaves out important perceptual aspects of reflectance as higher-order global illumination effects and self-shadowing are not modeled. We present a new neural representation for face reflectance where we can estimate all components of the reflectance responsible for the final appearance from a single monocular image. Instead of modeling each component of the reflectance separately using parametric models, our neural representation allows us to generate a basis set of faces in a geometric deformation-invariant space, parameterized by the input light direction, viewpoint and face geometry. We learn to reconstruct this reflectance field of a face just from a monocular image, which can be used to render the face from any viewpoint in any light condition. Our method is trained on a light-stage training dataset, which captures 300 people illuminated with 150 light conditions from 8 viewpoints. We show that our method outperforms existing monocular reflectance reconstruction methods, in terms of photorealism due to better capturing of physical premitives, such as sub-surface scattering, specularities, self-shadows and other higher-order effects.
△ Less
Submitted 24 August, 2020;
originally announced August 2020.
-
Joint User Grou**, Scheduling, and Precoding for Multicast Energy Efficiency in Multigroup Multicast Systems
Authors:
Ashok Bandi,
Bhavani Shankar Mysore R,
Symeon Chatzinotas,
Björn Ottersten
Abstract:
This paper studies the joint design of user grou**, scheduling (or admission control) and precoding to optimize energy efficiency (EE) for multigroup multicast scenarios in single-cell multiuser MISO downlink channels. Noticing that the existing definition of EE fails to account for group sizes, a new metric called multicast energy efficiency (MEE) is proposed. In this context, the joint design…
▽ More
This paper studies the joint design of user grou**, scheduling (or admission control) and precoding to optimize energy efficiency (EE) for multigroup multicast scenarios in single-cell multiuser MISO downlink channels. Noticing that the existing definition of EE fails to account for group sizes, a new metric called multicast energy efficiency (MEE) is proposed. In this context, the joint design is considered for the maximization of MEE, EE, and scheduled users. Firstly, with the help of binary variables (associated with grou** and scheduling) the joint design problem is formulated as a mixed-Boolean fractional programming problem such that it facilitates the joint update of grou**, scheduling and precoding variables. Further, several novel optimization formulations are proposed to reveal the hidden difference of convex/ concave structure in the objective and associated constraints. Thereafter, we propose a convex-concave procedure framework based iterative algorithm for each optimization criteria where grou**, scheduling, and precoding variables are updated jointly in each iteration. Finally, we compare the performance of the three design criteria concerning three performance metrics namely MEE, EE, and scheduled users through Monte-Carlo simulations. These simulations establish the need for MEE and the improvement from the system optimization.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Team Mountaineers Space Robotic Challenge Phase-2 Qualification Round Preparation Report
Authors:
Cagri Kilic,
Christopher A. Tatsch,
Bernardo Martinez R. Jr,
Jared J. Beard,
Derek W. Ross,
Jason N. Gross
Abstract:
Team Mountaineers launched efforts on the NASA Space Robotics Challenge Phase-2 (SRC2). The challenge will be held on the lunar terrain with virtual robotic platforms to establish an in-situ resource utilization process. In this report, we provide an overview of a simulation environment, a virtual mobile robot, and a software architecture that was created by Team Mountaineers in order to prepare f…
▽ More
Team Mountaineers launched efforts on the NASA Space Robotics Challenge Phase-2 (SRC2). The challenge will be held on the lunar terrain with virtual robotic platforms to establish an in-situ resource utilization process. In this report, we provide an overview of a simulation environment, a virtual mobile robot, and a software architecture that was created by Team Mountaineers in order to prepare for the competition's qualification round before the competition environment was released.
△ Less
Submitted 22 March, 2020;
originally announced March 2020.
-
Design and Development of Underwater Vehicle: ANAHITA
Authors:
Akash Jain,
Manish Kumar,
Rithvik Patibandla,
Balamurugan R,
Naveen Chandra R,
Abhinav Arora,
Akash K Singh,
Varun Pawar,
Aditya Rai,
Medha Agarwal,
Priank Prasad,
Vandit Sanadhya,
Prateek Yadav,
Inshu Namdev,
Nilay Shah,
Saksham Mittal,
Ayush Gupta,
Naman Agarwal,
Mangal Kothari
Abstract:
Anahita is an autonomous underwater vehicle which is currently being developed by interdisciplinary team of students at Indian Institute of Technology(IIT) Kanpur with aim to provide a platform for research in AUV to undergraduate students. This is the second vehicle which is being designed by AUV-IITK team to participate in 6th NIOT-SAVe competition organized by the National Institute of Ocean Te…
▽ More
Anahita is an autonomous underwater vehicle which is currently being developed by interdisciplinary team of students at Indian Institute of Technology(IIT) Kanpur with aim to provide a platform for research in AUV to undergraduate students. This is the second vehicle which is being designed by AUV-IITK team to participate in 6th NIOT-SAVe competition organized by the National Institute of Ocean Technology, Chennai. The Vehicle has been completely redesigned with the major improvements in modularity and ease of access of all the components, kee** the design very compact and efficient. New advancements in the vehicle include, power distribution system and monitoring system. The sensors include the inertial measurement units (IMU), hydrophone array, a depth sensor, and two RGB cameras. The current vehicle features hot swappable battery pods giving a huge advantage over the previous vehicle, for longer runtime.
△ Less
Submitted 1 October, 2021; v1 submitted 1 March, 2019;
originally announced March 2019.
-
RNNSecureNet: Recurrent neural networks for Cyber security use-cases
Authors:
Mohammed Harun Babu R,
Vinayakumar R,
Soman KP
Abstract:
Recurrent neural network (RNN) is an effective neural network in solving very complex supervised and unsupervised tasks. There has been a significant improvement in RNN field such as natural language processing, speech processing, computer vision and other multiple domains. This paper deals with RNN application on different use cases like Incident Detection, Fraud Detection, and Android Malware Cl…
▽ More
Recurrent neural network (RNN) is an effective neural network in solving very complex supervised and unsupervised tasks. There has been a significant improvement in RNN field such as natural language processing, speech processing, computer vision and other multiple domains. This paper deals with RNN application on different use cases like Incident Detection, Fraud Detection, and Android Malware Classification. The best performing neural network architecture is chosen by conducting different chain of experiments for different network parameters and structures. The network is run up to 1000 epochs with learning rate set in the range of 0.01 to 0.5.Obviously, RNN performed very well when compared to classical machine learning algorithms. This is mainly possible because RNNs implicitly extracts the underlying features and also identifies the characteristics of the data. This helps to achieve better accuracy.
△ Less
Submitted 5 January, 2019;
originally announced January 2019.
-
The Effect of Introducing Redundancy in a Probabilistic Forwarding Protocol
Authors:
Vinay Kumar B. R.,
Roshan Antony,
Navin Kashyap
Abstract:
This paper is concerned with the problem of broadcasting information from a source node to every node in an ad-hoc network. Flooding, as a broadcast mechanism, involves each node forwarding any packet it receives to all its neighbours. This results in excessive transmissions and thus a high energy expenditure overall. Probabilistic forwarding or gossi** involves each node forwarding a received p…
▽ More
This paper is concerned with the problem of broadcasting information from a source node to every node in an ad-hoc network. Flooding, as a broadcast mechanism, involves each node forwarding any packet it receives to all its neighbours. This results in excessive transmissions and thus a high energy expenditure overall. Probabilistic forwarding or gossi** involves each node forwarding a received packet to all its neighbours only with a certain probability $p$. In this paper, we study the effect of introducing redundancy, in the form of coded packets, into a probabilistic forwarding protocol. Specifically, we assume that the source node has $k$ data packets to broadcast, which are encoded into $n \ge k$ coded packets, such that any $k$ of these coded packets are sufficient to recover the original $k$ data packets. Our interest is in determining the minimum forwarding probability $p$ for a "successful broadcast", which we take to be the event that the expected fraction of network nodes that receive at least $k$ of the $n$ coded packets is close to 1. We examine, via simulations and analysis of a number of different network topologies (e.g., trees, grids, random geometric graphs), how this minimum forwarding probability, and correspondingly, the expected total number of packet transmissions varies with the amount of redundancy added. Our simulation results indicate that over network topologies that are highly connected, the introduction of redundancy into the probabilistic forwarding protocol is useful, as it can significantly reduce the expected total number of transmissions needed for a successful broadcast. On the other hand, for trees, our analysis shows that the expected total number of transmissions needed increases with redundancy.
△ Less
Submitted 10 January, 2019; v1 submitted 7 January, 2019;
originally announced January 2019.
-
A short review on Applications of Deep learning for Cyber security
Authors:
Mohammed Harun Babu R,
Vinayakumar R,
Soman KP
Abstract:
Deep learning is an advanced model of traditional machine learning. This has the capability to extract optimal feature representation from raw input samples. This has been applied towards various use cases in cyber security such as intrusion detection, malware classification, android malware detection, spam and phishing detection and binary analysis. This paper outlines the survey of all the works…
▽ More
Deep learning is an advanced model of traditional machine learning. This has the capability to extract optimal feature representation from raw input samples. This has been applied towards various use cases in cyber security such as intrusion detection, malware classification, android malware detection, spam and phishing detection and binary analysis. This paper outlines the survey of all the works related to deep learning based solutions for various cyber security use cases. Keywords: Deep learning, intrusion detection, malware detection, Android malware detection, spam & phishing detection, traffic analysis, binary analysis.
△ Less
Submitted 29 January, 2019; v1 submitted 15 December, 2018;
originally announced December 2018.
-
Overlap** community detection using superior seed set selection in social networks
Authors:
Belfin R V,
E. Grace Mary Kanaga,
Piotr Bródka
Abstract:
Community discovery in the social network is one of the tremendously expanding areas which earn interest among researchers for the past one decade. There are many already existing algorithms. However, new seed-based algorithms establish an emerging drift in this area. The basic idea behind these strategies is to identify exceptional nodes in the given network, called seeds, around which communitie…
▽ More
Community discovery in the social network is one of the tremendously expanding areas which earn interest among researchers for the past one decade. There are many already existing algorithms. However, new seed-based algorithms establish an emerging drift in this area. The basic idea behind these strategies is to identify exceptional nodes in the given network, called seeds, around which communities can be located. This paper proposes a blended strategy for locating suitable superior seed set by applying various centrality measures and using them to find overlap** communities. The examination of the algorithm has been performed regarding the goodness of the identified communities with the help of intra-cluster density and inter-cluster density. Finally, the runtime of the proposed algorithm has been compared with the existing community detection algorithms showing remarkable improvement.
△ Less
Submitted 10 August, 2018;
originally announced August 2018.
-
SwiDeN : Convolutional Neural Networks For Depiction Invariant Object Recognition
Authors:
Ravi Kiran Sarvadevabhatla,
Shiv Surya,
Srinivas S S Kruthiventi,
Venkatesh Babu R
Abstract:
Current state of the art object recognition architectures achieve impressive performance but are typically specialized for a single depictive style (e.g. photos only, sketches only). In this paper, we present SwiDeN : our Convolutional Neural Network (CNN) architecture which recognizes objects regardless of how they are visually depicted (line drawing, realistic shaded drawing, photograph etc.). I…
▽ More
Current state of the art object recognition architectures achieve impressive performance but are typically specialized for a single depictive style (e.g. photos only, sketches only). In this paper, we present SwiDeN : our Convolutional Neural Network (CNN) architecture which recognizes objects regardless of how they are visually depicted (line drawing, realistic shaded drawing, photograph etc.). In SwiDeN, we utilize a novel `deep' depictive style-based switching mechanism which appropriately addresses the depiction-specific and depiction-invariant aspects of the problem. We compare SwiDeN with alternative architectures and prior work on a 50-category Photo-Art dataset containing objects depicted in multiple styles. Experimental results show that SwiDeN outperforms other approaches for the depiction-invariant object recognition problem.
△ Less
Submitted 29 July, 2016;
originally announced July 2016.
-
Analyzing structural characteristics of object category representations from their semantic-part distributions
Authors:
Ravi Kiran Sarvadevabhatla,
Venkatesh Babu R
Abstract:
Studies from neuroscience show that part-map** computations are employed by human visual system in the process of object recognition. In this work, we present an approach for analyzing semantic-part characteristics of object category representations. For our experiments, we use category-epitome, a recently proposed sketch-based spatial representation for objects. To enable part-importance analys…
▽ More
Studies from neuroscience show that part-map** computations are employed by human visual system in the process of object recognition. In this work, we present an approach for analyzing semantic-part characteristics of object category representations. For our experiments, we use category-epitome, a recently proposed sketch-based spatial representation for objects. To enable part-importance analysis, we first obtain semantic-part annotations of hand-drawn sketches originally used to construct the corresponding epitomes. We then examine the extent to which the semantic-parts are present in the epitomes of a category and visualize the relative importance of parts as a word cloud. Finally, we show how such word cloud visualizations provide an intuitive understanding of category-level structural trends that exist in the category-epitome object representations.
△ Less
Submitted 15 September, 2015;
originally announced September 2015.
-
Towards Refactoring of DMARF and GIPSY Case Studies -- A Team 5 SOEN6471-S14 Project Report
Authors:
Pavan Kumar Polu,
Amjad Al Najjar,
Biswajit Banik,
Ajay Sujit Kumar,
Gustavo Pereira,
Prince Japhlet,
Bhanu Prakash R.,
Sabari Krishna Raparla
Abstract:
This paper presents an analysis of the architectural design of two distributed open source systems (OSS) developed in Java: Distributed Modular Audio Recognition Framework (DMARF) and General Intensional Programming System (GIPSY). The research starts with a background study of these frameworks to determine their overall architectures. Afterwards, we identify the actors and stakeholders and draft…
▽ More
This paper presents an analysis of the architectural design of two distributed open source systems (OSS) developed in Java: Distributed Modular Audio Recognition Framework (DMARF) and General Intensional Programming System (GIPSY). The research starts with a background study of these frameworks to determine their overall architectures. Afterwards, we identify the actors and stakeholders and draft a domain model for each framework. Next, we evaluated and proposed a fused DMARF over GIPSY Run-time Architecture (DoGRTA) as a domain concept. Later on, the team extracted and studied the actual class diagrams and determined classes of interest. Next, we identified design patterns that were present within the code of each framework. Finally, code smells in the source code were detected using popular tools and a selected number of those identified smells were refactored using established techniques and implemented in the final source code. Tests were written and ran prior and after the refactoring to check for any behavioral changes.
△ Less
Submitted 23 December, 2014;
originally announced December 2014.
-
Interference Mitigating Satellite Broadcast Receiver using Reduced Complexity List-Based Detection in Correlated Noise
Authors:
Zohair Abu-Shaban,
Hani Mehrpouyan,
Bhavani Shankar M. R.,
Bjorn Ottersten
Abstract:
The recent commercial trends towards using smaller dish antennas for satellite receivers, and the growing density of broadcasting satellites, necessitate the application of robust adjacent satellite interference (ASI) cancellation schemes. This orbital density growth along with the wider beamwidth of a smaller dish have imposed an overloaded scenario at the satellite receiver, where the number of…
▽ More
The recent commercial trends towards using smaller dish antennas for satellite receivers, and the growing density of broadcasting satellites, necessitate the application of robust adjacent satellite interference (ASI) cancellation schemes. This orbital density growth along with the wider beamwidth of a smaller dish have imposed an overloaded scenario at the satellite receiver, where the number of transmitting satellites exceeds the number of receiving elements at the dish antenna. To ensure successful operation in this practical scenario, we propose a satellite receiver that enhances signal detection from the desired satellite by mitigating the interference from neighboring satellites. Towards this objective, we propose a reduced complexity list-based group-wise search detection (RC-LGSD) receiver under the assumption of spatially correlated additive noise. To further enhance detection performance, the proposed satellite receiver utilizes a newly designed whitening filter to remove the spatial correlation amongst the noise parameters, while also applying a preprocessor that maximizes the signal-to-interference-plus-noise ratio (SINR). Extensive simulations under practical scenarios show that the proposed receiver enhances the performance of satellite broadcast systems in the presence of ASI compared to existing methods.
△ Less
Submitted 25 April, 2014;
originally announced April 2014.
-
Enhanced List-Based Group-Wise Overloaded Receiver with Application to Satellite Reception
Authors:
Zohair Abu-Shaban,
Bhavani Shankar M. R,
Hani Mehrpouyan,
Bjorn Ottersten
Abstract:
The market trends towards the use of smaller dish antennas for TV satellite receivers, as well as the growing density of broadcasting satellites in orbit require the application of robust adjacent satellite interference (ASI) cancellation algorithms at the receivers. The wider beamwidth of a small size dish and the growing number of satellites in orbit impose an overloaded scenario, i.e., a scenar…
▽ More
The market trends towards the use of smaller dish antennas for TV satellite receivers, as well as the growing density of broadcasting satellites in orbit require the application of robust adjacent satellite interference (ASI) cancellation algorithms at the receivers. The wider beamwidth of a small size dish and the growing number of satellites in orbit impose an overloaded scenario, i.e., a scenario where the number of transmitting satellites exceeds the number of receiving antennas. For such a scenario, we present a two stage receiver to enhance signal detection from the satellite of interest, i.e., the satellite that the dish is pointing to, while reducing interference from neighboring satellites. Towards this objective, we propose an enhanced List-based Group-wise Search Detection (LGSD) receiver architecture that takes into account the spatially correlated additive noise and uses the signal-to-interference-plus noise ratio (SINR) maximization criterion to improve detection performance. Simulations show that the proposed receiver structure enhances the performance of satellite systems in the presence of ASI when compared to existing methods.
△ Less
Submitted 17 April, 2014;
originally announced April 2014.
-
The Best Answers? Think Twice: Online Detection of Commercial Campaigns in the CQA Forums
Authors:
Cheng Chen,
Kui Wu,
Venkatesh Srinivasan,
Kesav Bharadwaj R
Abstract:
In an emerging trend, more and more Internet users search for information from Community Question and Answer (CQA) websites, as interactive communication in such websites provides users with a rare feeling of trust. More often than not, end users look for instant help when they browse the CQA websites for the best answers. Hence, it is imperative that they should be warned of any potential commerc…
▽ More
In an emerging trend, more and more Internet users search for information from Community Question and Answer (CQA) websites, as interactive communication in such websites provides users with a rare feeling of trust. More often than not, end users look for instant help when they browse the CQA websites for the best answers. Hence, it is imperative that they should be warned of any potential commercial campaigns hidden behind the answers. However, existing research focuses more on the quality of answers and does not meet the above need. In this paper, we develop a system that automatically analyzes the hidden patterns of commercial spam and raises alarms instantaneously to end users whenever a potential commercial campaign is detected. Our detection method integrates semantic analysis and posters' track records and utilizes the special features of CQA websites largely different from those in other types of forums such as microblogs or news reports. Our system is adaptive and accommodates new evidence uncovered by the detection algorithms over time. Validated with real-world trace data from a popular Chinese CQA website over a period of three months, our system shows great potential towards adaptive online detection of CQA spams.
△ Less
Submitted 5 January, 2013; v1 submitted 7 August, 2012;
originally announced August 2012.