Skip to main content

Showing 1–50 of 55 results for author: Molchanov, P

.
  1. arXiv:2406.10260  [pdf, other

    cs.CL cs.LG

    Flextron: Many-in-One Flexible Large Language Model

    Authors: Ruisi Cai, Saurav Muralidharan, Greg Heinrich, Hongxu Yin, Zhangyang Wang, Jan Kautz, Pavlo Molchanov

    Abstract: Training modern LLMs is extremely resource intensive, and customizing them for various deployment scenarios characterized by limited compute and memory resources through repeated training is impractical. In this paper, we introduce Flextron, a network architecture and post-training model optimization framework supporting flexible model deployment. The Flextron architecture utilizes a nested elasti… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2406.04484  [pdf, ps, other

    cs.CV

    Step Out and Seek Around: On Warm-Start Training with Incremental Data

    Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jose M. Alvarez

    Abstract: Data often arrives in sequence over time in real-world deep learning applications such as autonomous driving. When new training data is available, training the model from scratch undermines the benefit of leveraging the learned knowledge, leading to significant training costs. Warm-starting from a previously trained checkpoint is the most intuitive way to retain knowledge and advance learning. How… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2405.19335  [pdf, other

    cs.CV cs.CL cs.LG

    X-VILA: Cross-Modality Alignment for Large Language Model

    Authors: Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei **, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin

    Abstract: We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities. By aligning modality-specific encoders with LLM inputs and diffusion decoders with LLM outputs, X-VILA achieves cross-modality understanding, reasoning, and generation. To facilitate this cross-modality alignment, we curate an effectiv… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Technical Report

  4. arXiv:2403.19046  [pdf, other

    cs.CV cs.AI

    LITA: Language Instructed Temporal-Localization Assistant

    Authors: De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan Kautz

    Abstract: There has been tremendous progress in multimodal Large Language Models (LLMs). Recent works have extended these models to video input with promising instruction following capabilities. However, an important missing piece is temporal localization. These models cannot accurately answer the "When?" questions. We identify three key aspects that limit their temporal localization capabilities: (i) time… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  5. arXiv:2402.09353  [pdf, other

    cs.CL cs.CV

    DoRA: Weight-Decomposed Low-Rank Adaptation

    Authors: Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen

    Abstract: Among the widely used parameter-efficient fine-tuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Code available at https://github.com/NVlabs/DoRA

  6. arXiv:2312.07533  [pdf, other

    cs.CV

    VILA: On Pre-training for Visual Language Models

    Authors: Ji Lin, Hongxu Yin, Wei **, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song Han

    Abstract: Visual language models (VLMs) rapidly progressed with the recent success of large language models. There have been growing efforts on visual instruction tuning to extend the LLM with visual inputs, but lacks an in-depth study of the visual language pre-training process, where the model learns to perform joint modeling on both modalities. In this work, we examine the design options for VLM pre-trai… ▽ More

    Submitted 16 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  7. arXiv:2312.06709  [pdf, other

    cs.CV

    AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One

    Authors: Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov

    Abstract: A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks. VFMs like CLIP, DINOv2, SAM are trained with distinct objectives, exhibiting unique characteristics for various downstream tasks. We find that despite their conceptual differences, these models can be effectively merged into a unified model through multi-teacher distillation. We name… ▽ More

    Submitted 30 April, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Version 3: CVPR Camera Ready, reconfigured full paper, table 1 is now more comprehensive Version 2: Added more acknowledgements and updated table 7 with more recent results. Ensured that the link in the abstract to our code is working properly Version 3: Fix broken hyperlinks

  8. arXiv:2310.13768  [pdf, other

    cs.CV

    PACE: Human and Camera Motion Estimation from in-the-wild Videos

    Authors: Muhammed Kocabas, Ye Yuan, Pavlo Molchanov, Yunrong Guo, Michael J. Black, Otmar Hilliges, Jan Kautz, Umar Iqbal

    Abstract: We present a method to estimate human motion in a global scene from moving cameras. This is a highly challenging task due to the coupling of human and camera motions in the video. To address this problem, we propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features. Unlike existing methods that use SLAM… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 3DV 2024. Project page: https://nvlabs.github.io/PACE/

  9. arXiv:2306.14306  [pdf, other

    cs.LG cs.CV

    Adaptive Sharpness-Aware Pruning for Robust Sparse Networks

    Authors: Anna Bair, Hongxu Yin, Maying Shen, Pavlo Molchanov, Jose Alvarez

    Abstract: Robustness and compactness are two essential attributes of deep learning models that are deployed in the real world. The goals of robustness and compactness may seem to be at odds, since robustness requires generalization across domains, while the process of compression exploits specificity in one domain. We introduce Adaptive Sharpness-Aware Pruning (AdaSAP), which unifies these goals through the… ▽ More

    Submitted 13 March, 2024; v1 submitted 25 June, 2023; originally announced June 2023.

  10. arXiv:2306.08593  [pdf, other

    cs.CV cs.LG

    Heterogeneous Continual Learning

    Authors: Divyam Madaan, Hongxu Yin, Wonmin Byeon, Jan Kautz, Pavlo Molchanov

    Abstract: We propose a novel framework and a solution to tackle the continual learning (CL) problem with changing network architectures. Most CL methods focus on adapting a single architecture to a new task/class by modifying its weights. However, with rapid progress in architecture design, the problem of adapting existing solutions to novel architectures becomes relevant. To address this limitation, we pro… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted to CVPR 2023

  11. arXiv:2306.06189  [pdf, other

    cs.CV cs.AI cs.LG

    FasterViT: Fast Vision Transformers with Hierarchical Attention

    Authors: Ali Hatamizadeh, Greg Heinrich, Hongxu Yin, Andrew Tao, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov

    Abstract: We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus on high image throughput for computer vision (CV) applications. FasterViT combines the benefits of fast local representation learning in CNNs and global modeling properties in ViT. Our newly introduced Hierarchical Attention (HAT) approach decomposes global self-attention with quadratic complexity into a multi-… ▽ More

    Submitted 1 April, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: ICLR'24 Accepted Paper

  12. arXiv:2304.00600  [pdf, other

    cs.CV cs.LG

    Recurrence without Recurrence: Stable Video Landmark Detection with Deep Equilibrium Models

    Authors: Paul Micaelli, Arash Vahdat, Hongxu Yin, Jan Kautz, Pavlo Molchanov

    Abstract: Cascaded computation, whereby predictions are recurrently refined over several stages, has been a persistent theme throughout the development of landmark detection models. In this work, we show that the recently proposed Deep Equilibrium Model (DEQ) can be naturally adapted to this form of computation. Our Landmark DEQ (LDEQ) achieves state-of-the-art performance on the challenging WFLW facial lan… ▽ More

    Submitted 2 April, 2023; originally announced April 2023.

  13. arXiv:2212.03237  [pdf, other

    cs.CV

    RANA: Relightable Articulated Neural Avatars

    Authors: Umar Iqbal, Akin Caliskan, Koki Nagano, Sameh Khamis, Pavlo Molchanov, Jan Kautz

    Abstract: We propose RANA, a relightable and articulated neural avatar for the photorealistic synthesis of humans under arbitrary viewpoints, body poses, and lighting. We only require a short video clip of the person to create the avatar and assume no knowledge about the lighting environment. We present a novel framework to model humans while disentangling their geometry, texture, and also lighting environm… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: project page: https://nvlabs.github.io/RANA/

  14. arXiv:2211.05648  [pdf, other

    physics.acc-ph physics.plasm-ph

    Towards High-Power Microwaves

    Authors: S. Anishchenko, V. Baryshevsky, A. Gurinovich, E. Gurnevich, P. Molchanov, A. Rouba

    Abstract: In this paper, we review and compare HPM sources operating without a magnetic field to guide the electron beam that are capable of producing high-power microwave (HPM) pulses with a duration of about 100 ns. The proposed analysis summarizes multi-year research carried with three types of HPM sources: a split-cavity oscillator (SCO); an axial vircator; and a virtual cathode oscillator in Reflex Tri… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: 18 pages, 19 figures; To be presented at GlobalEM 2022

  15. arXiv:2210.06659  [pdf, other

    cs.CV

    Structural Pruning via Latency-Saliency Knapsack

    Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jianna Liu, Jose M. Alvarez

    Abstract: Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget on targeting device. For filter importance ranking, HALP leverages latency lookup table to tr… ▽ More

    Submitted 18 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted by NeurIPS 2022. arXiv admin note: substantial text overlap with arXiv:2110.10811

  16. arXiv:2206.09959  [pdf, other

    cs.CV cs.AI cs.LG

    Global Context Vision Transformers

    Authors: Ali Hatamizadeh, Hongxu Yin, Greg Heinrich, Jan Kautz, Pavlo Molchanov

    Abstract: We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization for computer vision. Our method leverages global context self-attention modules, joint with standard local self-attention, to effectively and efficiently model both long and short-range spatial interactions, without the need for expensive operations such as computing attentio… ▽ More

    Submitted 6 June, 2023; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: Accepted to ICML 2023

  17. arXiv:2203.15798  [pdf, other

    cs.CV

    DRaCoN -- Differentiable Rasterization Conditioned Neural Radiance Fields for Articulated Avatars

    Authors: Amit Raj, Umar Iqbal, Koki Nagano, Sameh Khamis, Pavlo Molchanov, James Hays, Jan Kautz

    Abstract: Acquisition and creation of digital human avatars is an important problem with applications to virtual telepresence, gaming, and human modeling. Most contemporary approaches for avatar generation can be viewed either as 3D-based methods, which use multi-view data to learn a 3D representation with appearance (such as a mesh, implicit surface, or volume), or 2D-based methods which learn photo-realis… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Project page at https://dracon-avatars.github.io/

  18. arXiv:2203.11894  [pdf, other

    cs.CV cs.AI cs.CR cs.DC cs.LG

    GradViT: Gradient Inversion of Vision Transformers

    Authors: Ali Hatamizadeh, Hongxu Yin, Holger Roth, Wenqi Li, Jan Kautz, Daguang Xu, Pavlo Molchanov

    Abstract: In this work we demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks. During this attack, the original data batch is reconstructed given model weights and the corresponding gradients. We introduce a method, named GradViT, that optimizes random noise into naturally looking images via an iterative process. The optimization objective consists of (i) a loss o… ▽ More

    Submitted 27 March, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: CVPR'22 Accepted Paper

  19. arXiv:2202.06924  [pdf, other

    cs.LG cs.CR cs.CV cs.DC

    Do Gradient Inversion Attacks Make Federated Learning Unsafe?

    Authors: Ali Hatamizadeh, Hongxu Yin, Pavlo Molchanov, Andriy Myronenko, Wenqi Li, Prerna Dogra, Andrew Feng, Mona G. Flores, Jan Kautz, Daguang Xu, Holger R. Roth

    Abstract: Federated learning (FL) allows the collaborative training of AI models without needing to share raw data. This capability makes it especially interesting for healthcare applications where patient and data privacy is of utmost concern. However, recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training da… ▽ More

    Submitted 30 January, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Revised version; Accepted to IEEE Transactions on Medical Imaging; Improved and reformatted version of https://www.researchsquare.com/article/rs-1147182/v2; Added NVFlare reference

  20. arXiv:2112.07658  [pdf, other

    cs.CV cs.LG

    AdaViT: Adaptive Tokens for Efficient Vision Transformer

    Authors: Hongxu Yin, Arash Vahdat, Jose Alvarez, Arun Mallya, Jan Kautz, Pavlo Molchanov

    Abstract: We introduce A-ViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity. A-ViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds. We reformulate Adaptive Computation Time (ACT) for this task, extending halting to discard redundant spatial tokens.… ▽ More

    Submitted 5 October, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: CVPR'22 oral acceptance

  21. arXiv:2112.01524  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras

    Authors: Ye Yuan, Umar Iqbal, Pavlo Molchanov, Kris Kitani, Jan Kautz

    Abstract: We present an approach for 3D global human mesh recovery from monocular videos recorded with dynamic cameras. Our approach is robust to severe and long-term occlusions and tracks human bodies even when they go outside the camera's field of view. To achieve this, we first propose a deep generative motion infiller, which autoregressively infills the body motions of occluded humans based on visible m… ▽ More

    Submitted 30 March, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: CVPR 2022 (Oral). Project page: https://nvlabs.github.io/GLAMR

  22. arXiv:2110.12007  [pdf, other

    cs.CV cs.LG

    When to Prune? A Policy towards Early Structural Pruning

    Authors: Maying Shen, Pavlo Molchanov, Hongxu Yin, Jose M. Alvarez

    Abstract: Pruning enables appealing reductions in network memory footprint and time complexity. Conventional post-training pruning techniques lean towards efficient inference while overlooking the heavy computation for training. Recent exploration of pre-training pruning at initialization hints on training cost reduction via pruning, but suffers noticeable performance degradation. We attempt to combine the… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

  23. arXiv:2110.10811  [pdf, ps, other

    cs.CV cs.LG

    HALP: Hardware-Aware Latency Pruning

    Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jianna Liu, Jose M. Alvarez

    Abstract: Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget. For filter importance ranking, HALP leverages latency lookup table to track latency reductio… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

  24. arXiv:2110.04869  [pdf, other

    cs.CV

    Global Vision Transformer Pruning with Hessian-Aware Saliency

    Authors: Huanrui Yang, Hongxu Yin, Maying Shen, Pavlo Molchanov, Hai Li, Jan Kautz

    Abstract: Transformers yield state-of-the-art results across many tasks. However, their heuristically designed architecture impose huge computational costs during inference. This work aims on challenging the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage, where we redistribute the parameters both across transformer blocks… ▽ More

    Submitted 29 March, 2023; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Accepted as a conference paper at CVPR 2023

  25. arXiv:2107.10624  [pdf, other

    cs.CV cs.AI cs.LG

    LANA: Latency Aware Network Acceleration

    Authors: Pavlo Molchanov, Jimmy Hall, Hongxu Yin, Jan Kautz, Nicolo Fusi, Arash Vahdat

    Abstract: We introduce latency-aware network acceleration (LANA) - an approach that builds on neural architecture search techniques and teacher-student distillation to accelerate neural networks. LANA consists of two phases: in the first phase, it trains many alternative operations for every layer of the teacher network using layer-wise feature map distillation. In the second phase, it solves the combinator… ▽ More

    Submitted 18 November, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

  26. arXiv:2107.06304  [pdf, other

    cs.LG cs.CV

    Privacy Vulnerability of Split Computing to Data-Free Model Inversion Attacks

    Authors: Xin Dong, Hongxu Yin, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov, H. T. Kung

    Abstract: Mobile edge devices see increased demands in deep neural networks (DNNs) inference while suffering from stringent constraints in computing resources. Split computing (SC) emerges as a popular approach to the issue by executing only initial layers on devices and offloading the remaining to the cloud. Prior works usually assume that SC offers privacy benefits as only intermediate features, instead o… ▽ More

    Submitted 24 October, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: A new data-free inversion method to reverse neural networks and get input from intermediate feature maps. BMVC'22

  27. arXiv:2106.05954  [pdf, other

    cs.CV

    Adversarial Motion Modelling helps Semi-supervised Hand Pose Estimation

    Authors: Adrian Spurr, Pavlo Molchanov, Umar Iqbal, Jan Kautz, Otmar Hilliges

    Abstract: Hand pose estimation is difficult due to different environmental conditions, object- and self-occlusion as well as diversity in hand shape and appearance. Exhaustively covering this wide range of factors in fully annotated datasets has remained impractical, posing significant challenges for generalization of supervised methods. Embracing this challenge, we propose to combine ideas from adversarial… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  28. arXiv:2104.13502  [pdf, other

    cs.CV

    KAMA: 3D Keypoint Aware Body Mesh Articulation

    Authors: Umar Iqbal, Kevin Xie, Yunrong Guo, Jan Kautz, Pavlo Molchanov

    Abstract: We present KAMA, a 3D Keypoint Aware Mesh Articulation approach that allows us to estimate a human body mesh from the positions of 3D body keypoints. To this end, we learn to estimate 3D positions of 26 body keypoints and propose an analytical solution to articulate a parametric body model, SMPL, via a set of straightforward geometric transformations. Since keypoint estimation directly relies on i… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: "Additional qualitative results: https://youtu.be/mPikZEIpUE0"

  29. arXiv:2104.07586  [pdf, other

    cs.LG cs.CV

    See through Gradients: Image Batch Recovery via GradInversion

    Authors: Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov

    Abstract: Training deep neural networks requires gradient estimation from data batches to update parameters. Gradients per parameter are averaged over a set of data and this has been presumed to be safe for privacy-preserving training in joint, collaborative, and federated learning applications. Prior work only showed the possibility of recovering input data given gradients under very restrictive conditions… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 accepted paper

  30. arXiv:2104.04631  [pdf, other

    cs.CV

    DexYCB: A Benchmark for Capturing Hand Gras** of Objects

    Authors: Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, Jan Kautz, Dieter Fox

    Abstract: We introduce DexYCB, a new dataset for capturing hand gras** of objects. We first compare DexYCB with a related one through cross-dataset evaluation. We then present a thorough benchmark of state-of-the-art approaches on three relevant tasks: 2D object and keypoint detection, 6D object pose estimation, and 3D hand pose estimation. Finally, we evaluate a new robotics-relevant task: generating saf… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR 2021

  31. arXiv:2003.13764  [pdf, other

    cs.CV

    Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

    Authors: Anil Armagan, Guillermo Garcia-Hernando, Seungryul Baek, Shreyas Hampali, Mahdi Rad, Zhaohui Zhang, Shipeng Xie, MingXiu Chen, Boshen Zhang, Fu Xiong, Yang Xiao, Zhiguo Cao, Junsong Yuan, Pengfei Ren, Weiting Huang, Haifeng Sun, Marek Hrúz, Jakub Kanis, Zdeněk Krňoul, Qingfu Wan, Shile Li, Linlin Yang, Dongheui Lee, Angela Yao, Weiguo Zhou , et al. (10 additional authors not shown)

    Abstract: We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole… ▽ More

    Submitted 10 September, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

    Comments: European Conference on Computer Vision (ECCV), 2020

  32. arXiv:2003.09282  [pdf, other

    cs.CV

    Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints

    Authors: Adrian Spurr, Umar Iqbal, Pavlo Molchanov, Otmar Hilliges, Jan Kautz

    Abstract: Estimating 3D hand pose from 2D images is a difficult, inverse problem due to the inherent scale and depth ambiguities. Current state-of-the-art methods train fully supervised deep neural networks with 3D ground-truth data. However, acquiring 3D annotations is expensive, typically requiring calibrated multi-view setups or labor intensive manual annotations. While annotations of 2D keypoints are mu… ▽ More

    Submitted 4 August, 2020; v1 submitted 20 March, 2020; originally announced March 2020.

  33. arXiv:2003.07581  [pdf, other

    cs.CV cs.LG

    Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the Wild

    Authors: Umar Iqbal, Pavlo Molchanov, Jan Kautz

    Abstract: One major challenge for monocular 3D human pose estimation in-the-wild is the acquisition of training data that contains unconstrained images annotated with accurate 3D poses. In this paper, we address this challenge by proposing a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data, which can be acquired easily in in-the-w… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.

    Comments: CVPR 2020

  34. arXiv:2002.09786  [pdf, other

    cs.LG cs.CV stat.ML

    HarDNN: Feature Map Vulnerability Evaluation in CNNs

    Authors: Abdulrahman Mahmoud, Siva Kumar Sastry Hari, Christopher W. Fletcher, Sarita V. Adve, Charbel Sakr, Naresh Shanbhag, Pavlo Molchanov, Michael B. Sullivan, Timothy Tsai, Stephen W. Keckler

    Abstract: As Convolutional Neural Networks (CNNs) are increasingly being employed in safety-critical applications, it is important that they behave reliably in the face of hardware errors. Transient hardware errors may percolate undesirable state during execution, resulting in software-manifested errors which can adversely affect high-level decision making. This paper presents HarDNN, a software-directed ap… ▽ More

    Submitted 25 February, 2020; v1 submitted 22 February, 2020; originally announced February 2020.

    Comments: 14 pages, 5 figures, a short version accepted for publication in First Workshop on Secure and Resilient Autonomy (SARA) co-located with MLSys2020

  35. arXiv:1912.08795  [pdf, other

    cs.LG cs.CV stat.ML

    Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion

    Authors: Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose M. Alvarez, Arun Mallya, Derek Hoiem, Niraj K. Jha, Jan Kautz

    Abstract: We introduce DeepInversion, a new method for synthesizing images from the image distribution used to train a deep neural network. We 'invert' a trained network (teacher) to synthesize class-conditional input images starting from random noise, without using any additional information about the training dataset. Kee** the teacher fixed, our method optimizes the input while regularizing the distrib… ▽ More

    Submitted 15 June, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

  36. arXiv:1906.10771  [pdf, other

    cs.LG cs.CV stat.ML

    Importance Estimation for Neural Network Pruning

    Authors: Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, Jan Kautz

    Abstract: Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and second-order Taylor expansions to approximate a filter's contribution.… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

  37. arXiv:1905.01941  [pdf, other

    cs.CV

    Few-Shot Adaptive Gaze Estimation

    Authors: Seonwook Park, Shalini De Mello, Pavlo Molchanov, Umar Iqbal, Otmar Hilliges, Jan Kautz

    Abstract: Inter-personal anatomical differences limit the accuracy of person-independent gaze estimation networks. Yet there is a need to lower gaze errors further to enable applications requiring higher quality. Further gains can be achieved by personalizing gaze networks, ideally with few calibration samples. However, over-parameterized neural networks are not amenable to learning from few examples as the… ▽ More

    Submitted 14 October, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: ICCV 2019 (Oral)

  38. arXiv:1905.01298  [pdf, other

    cs.CV

    SCOPS: Self-Supervised Co-Part Segmentation

    Authors: Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, Jan Kautz

    Abstract: Parts provide a good intermediate representation of objects that is robust with respect to the camera, pose and appearance variations. Existing works on part segmentation is dominated by supervised approaches that rely on large amounts of manual annotations and can not generalize to unseen object categories. We propose a self-supervised deep learning approach for part segmentation, where we devise… ▽ More

    Submitted 3 May, 2019; originally announced May 2019.

    Comments: Accepted in CVPR 2019. Project page: http://varunjampani.github.io/scops

  39. arXiv:1904.01636  [pdf, other

    cs.CV

    Towards annotation-efficient segmentation via image-to-image translation

    Authors: Eugene Vorontsov, Pavlo Molchanov, Christopher Beckham, Jan Kautz, Samuel Kadoury

    Abstract: Often in medical imaging, it is prohibitively challenging to produce enough boundary annotations to train deep neural networks for accurate tumor segmentation. We propose the use of weak labels about whether an image presents tumor or whether it is absent to extend training over images that lack these annotations. Specifically, we propose a semi-supervised framework that employs unpaired image-to-… ▽ More

    Submitted 11 June, 2021; v1 submitted 2 April, 2019; originally announced April 2019.

  40. arXiv:1812.08161  [pdf, ps, other

    physics.acc-ph

    Backward wave oscillator with resonator made of metal foils (photonic BWO)

    Authors: V. Baryshevsky, V. Evdokimov, A. Gurinovich, E. Gurnevich, P. Molchanov

    Abstract: Numerical and experimental analysis of high power microwave generation in photonic BWO, which uses foil photonic crystal, is presented. Single frequency excitation of the below cutoff modes in the photonic BWO is analyzed and demonstrated.

    Submitted 19 December, 2018; originally announced December 2018.

    Comments: 4 pages, 7 eps figures, submitted to IEEE Trans. Plasma Sci

  41. arXiv:1804.10123  [pdf, other

    cs.CV cs.NE

    IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification

    Authors: Sam Leroux, Pavlo Molchanov, Pieter Simoens, Bart Dhoedt, Thomas Breuel, Jan Kautz

    Abstract: Deep residual networks (ResNets) made a recent breakthrough in deep learning. The core idea of ResNets is to have shortcut connections between layers that allow the network to be much deeper while still being easy to optimize avoiding vanishing gradients. These shortcut connections have interesting side-effects that make ResNets behave differently from other typical network architectures. In this… ▽ More

    Submitted 26 April, 2018; originally announced April 2018.

    Comments: ICLR 2018 Workshop track

  42. arXiv:1804.09534  [pdf, other

    cs.CV cs.LG

    Hand Pose Estimation via Latent 2.5D Heatmap Regression

    Authors: Umar Iqbal, Pavlo Molchanov, Thomas Breuel, Juergen Gall, Jan Kautz

    Abstract: Estimating the 3D pose of a hand is an essential part of human-computer interaction. Estimating 3D pose using depth or multi-view sensors has become easier with recent advances in computer vision, however, regressing pose from a single RGB image is much less straightforward. The main difficulty arises from the fact that 3D pose requires some form of depth estimates, which are ambiguous given only… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

  43. arXiv:1712.03917  [pdf, other

    cs.CV

    Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals

    Authors: Shanxin Yuan, Guillermo Garcia-Hernando, Bjorn Stenger, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, Pavlo Molchanov, Jan Kautz, Sina Honari, Liuhao Ge, Junsong Yuan, Xinghao Chen, Gui** Wang, Fan Yang, Kai Akiyama, Yang Wu, Qingfu Wan, Meysam Madadi, Sergio Escalera, Shile Li, Dongheui Lee, Iason Oikonomidis, Antonis Argyros, Tae-Kyun Kim

    Abstract: In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation from depth images? And, what are the next challenges that need to be tackled? Following the successful Hands In the Million Challenge (HIM2017), we investigate the top 10 state-of-the-art methods on three tasks: single frame 3D pose estimation, 3D hand tracking, and hand pose estimation during ob… ▽ More

    Submitted 29 March, 2018; v1 submitted 11 December, 2017; originally announced December 2017.

  44. arXiv:1712.00097  [pdf, other

    cs.CV

    Budget-Aware Activity Detection with A Recurrent Policy Network

    Authors: Behrooz Mahasseni, Xiaodong Yang, Pavlo Molchanov, Jan Kautz

    Abstract: In this paper, we address the challenging problem of efficient temporal activity detection in untrimmed long videos. While most recent work has focused and advanced the detection accuracy, the inference time can take seconds to minutes in processing each single video, which is too slow to be useful in real-world settings. This motivates the proposed budget-aware framework, which learns to perform… ▽ More

    Submitted 8 May, 2018; v1 submitted 30 November, 2017; originally announced December 2017.

  45. arXiv:1709.01591  [pdf, other

    cs.CV

    Improving Landmark Localization with Semi-Supervised Learning

    Authors: Sina Honari, Pavlo Molchanov, Stephen Tyree, Pascal Vincent, Christopher Pal, Jan Kautz

    Abstract: We present two techniques to improve landmark localization in images from partially annotated datasets. Our primary goal is to leverage the common situation where precise landmark locations are only provided for a small data subset, but where class labels for classification or regression tasks related to the landmarks are more abundantly available. First, we propose the framework of sequential mul… ▽ More

    Submitted 28 October, 2018; v1 submitted 5 September, 2017; originally announced September 2017.

    Comments: Published as a conference paper in CVPR 2018

  46. arXiv:1705.07162  [pdf, other

    cs.CV

    A Lightweight Approach for On-the-Fly Reflectance Estimation

    Authors: Kihwan Kim, **wei Gu, Stephen Tyree, Pavlo Molchanov, Matthias Nießner, Jan Kautz

    Abstract: Estimating surface reflectance (BRDF) is one key component for complete 3D scene capture, with wide applications in virtual reality, augmented reality, and human computer interaction. Prior work is either limited to controlled environments (\eg gonioreflectometers, light stages, or multi-camera domes), or requires the joint optimization of shape, illumination, and reflectance, which is often compu… ▽ More

    Submitted 2 April, 2018; v1 submitted 19 May, 2017; originally announced May 2017.

    Comments: ICCV 2017

  47. arXiv:1611.06440  [pdf, other

    cs.LG stat.ML

    Pruning Convolutional Neural Networks for Resource Efficient Inference

    Authors: Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, Jan Kautz

    Abstract: We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with fine-tuning by backpropagation - a computationally efficient procedure that maintains good generalization in the pruned network. We propose a new criterion based on Taylor expansion that approximates the change in the cost function induce… ▽ More

    Submitted 8 June, 2017; v1 submitted 19 November, 2016; originally announced November 2016.

    Comments: 17 pages, 14 figures, ICLR 2017 paper

  48. arXiv:1609.02385  [pdf, other

    physics.acc-ph physics.plasm-ph

    Experimental Study of a Triode Reflex Geometry Vircator

    Authors: Vladimir Baryshevsky, Alexandra Gurinovich, Evgeny Gurnevich, Pavel Molchanov

    Abstract: Triode reflex geometry vircator operating within 3.0 - 4.2 GHz range with efficiency up to 6% is developed and experimentally investigated. Shiftable reflectors are shown to enable frequency tuning and output power control. Radiation frequency and power are analyzed for different cathode-anode gap values and varied reflector positions.

    Submitted 8 September, 2016; originally announced September 2016.

    Comments: 4 pages, 14 figures

  49. arXiv:1509.00522  [pdf, ps, other

    physics.acc-ph physics.plasm-ph

    Cumulation of High-current Electron Beams: Theory and Experiment

    Authors: S. V. Anishchenko, V. G. Baryshevsky, N. A. Belous, A. A. Gurinovich, E. A. Gurinovich, E. A. Gurnevich, P. V. Molchanov

    Abstract: A drastic cumulation of current density caused by electrostatic repulsion in relativistic vacuum diodes with ring-type cathodes is described theoretically and confirmed experimentally. The distinctive feature of the suggested cumulation mechanism over the conventional one, which relies on focusing a high-current beam by its own magnetic field, is a very low energy spread of electrons in the region… ▽ More

    Submitted 1 September, 2015; originally announced September 2015.

    Comments: Report on the conference NPCS 2015

  50. arXiv:1408.1824  [pdf, other

    physics.acc-ph

    Simulation of an axial vircator with a three-cavity resonator

    Authors: P. V. Molchanov, E. A. Gurnevich, V. V. Tikhomirov, S. E. Siahlo

    Abstract: We simulated an axial vircator with a three-cavity resonator and expected generation efficiency 6-7 percents. For adequate description of physical processes taking place inside a vircator we used two independent PIC codes: self-developed INPIC and free XOOPIC. Based on both the analysis of the vircator proposed in [1] and consideration of the devices operating at cathode-anode voltages under 450 k… ▽ More

    Submitted 8 August, 2014; originally announced August 2014.

    Comments: 11 pages, 16 figures

    MSC Class: 78A40 ACM Class: J.2