Skip to main content

Showing 151–200 of 233 results for author: Malik, J

.
  1. arXiv:1812.03982  [pdf, other

    cs.CV

    SlowFast Networks for Video Recognition

    Authors: Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He

    Abstract: We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating at high frame rate, to capture motion at fine temporal resolution. The Fast pathway can be made very lightweight by reducing its channel capacity, yet can learn useful temporal information for video recognition. Our… ▽ More

    Submitted 29 October, 2019; v1 submitted 10 December, 2018; originally announced December 2018.

    Comments: Technical report

  2. arXiv:1812.02249  [pdf, other

    physics.med-ph

    Fetal whole-heart 4D imaging using motion-corrected multi-planar real-time MRI

    Authors: Joshua FP van Amerom, David FA Lloyd, Maria Deprez, Anthony N Price, Shaihan J Malik, Kuberan Pushparajah, Milou PM van Poppel, Mary A Rutherford, Reza Razavi, Joseph V Hajnal

    Abstract: Purpose: To develop a MRI acquisition and reconstruction framework for volumetric cine visualisation of the fetal heart and great vessels in the presence of maternal and fetal motion. Methods: Four-dimensional depiction was achieved using a highly-accelerated multi-planar real-time balanced steady state free precession acquisition combined with retrospective image-domain techniques for motion co… ▽ More

    Submitted 5 December, 2018; originally announced December 2018.

  3. arXiv:1812.01601  [pdf, other

    cs.CV

    Learning 3D Human Dynamics from Video

    Authors: Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, Jitendra Malik

    Abstract: From an image of a person in action, we can easily guess the 3D motion of the person in the immediate past and future. This is because we have a mental model of 3D human dynamics that we have acquired from observing visual sequences of humans in motion. We present a framework that can similarly learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding… ▽ More

    Submitted 16 September, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: To appear in CVPR 2019. Changelog: v3. +an experiment to compare improvement from pseudo-gt data on single view vs temporal context model. v2. camready ver: Minor update in model training where the gaussian shape prior is used, updated results (similar results, same trends), added more ablation study in the appendix. v1. +evaluation protocol subsection in appendix, updated results due to bug fix

  4. arXiv:1812.00940  [pdf, other

    cs.CV cs.LG cs.RO

    Visual Memory for Robust Path Following

    Authors: Ashish Kumar, Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik

    Abstract: Humans routinely retrace paths in a novel environment both forwards and backwards despite uncertainty in their motion. This paper presents an approach for doing so. Given a demonstration of a path, a first network generates a path abstraction. Equipped with this abstraction, a second network observes the world and decides how to act to retrace the path under noisy actuation and a changing environm… ▽ More

    Submitted 3 December, 2018; originally announced December 2018.

    Comments: Neural Information Processing Systems (NeurIPS) 2018. Oral Presentation

  5. arXiv:1811.12569  [pdf, other

    cs.LG cs.CV stat.ML

    Are All Training Examples Created Equal? An Empirical Study

    Authors: Kailas Vodrahalli, Ke Li, Jitendra Malik

    Abstract: Modern computer vision algorithms often rely on very large training datasets. However, it is conceivable that a carefully selected subsample of the dataset is sufficient for training. In this paper, we propose a gradient-based importance measure that we use to empirically analyze relative importance of training images in four datasets of varying complexity. We find that in some cases, a small subs… ▽ More

    Submitted 29 November, 2018; originally announced November 2018.

    Comments: 12 pages, 12 figures

  6. arXiv:1811.12402  [pdf, ps, other

    cs.LG cs.CV stat.ML

    On the Implicit Assumptions of GANs

    Authors: Ke Li, Jitendra Malik

    Abstract: Generative adversarial nets (GANs) have generated a lot of excitement. Despite their popularity, they exhibit a number of well-documented issues in practice, which apparently contradict theoretical guarantees. A number of enlightening papers have pointed out that these issues arise from unjustified assumptions that are commonly made, but the message seems to have been lost amid the optimism of rec… ▽ More

    Submitted 29 November, 2018; originally announced November 2018.

    Comments: 8 pages

  7. arXiv:1811.12373  [pdf, other

    cs.CV cs.GR cs.LG

    Diverse Image Synthesis from Semantic Layouts via Conditional IMLE

    Authors: Ke Li, Tianhao Zhang, Jitendra Malik

    Abstract: Most existing methods for conditional image synthesis are only able to generate a single plausible image for any given input, or at best a fixed number of plausible images. In this paper, we focus on the problem of generating images from semantic segmentation maps and present a simple new method that can generate an arbitrary number of images with diverse appearance for the same semantic layout. U… ▽ More

    Submitted 29 August, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: 18 pages, 16 figures; IEEE International Conference on Computer Vision (ICCV), 2019

  8. arXiv:1811.11074  [pdf, other

    physics.data-an eess.SP stat.AP

    Recycling cardiogenic artifacts in impedance pneumography

    Authors: Yao Lu, Hau-tieng Wu, John Malik

    Abstract: Purpose: Biomedical sensors often exhibit cardiogenic artifacts which, while distorting the signal of interest, carry useful hemodynamic information. We propose an algorithm to remove and extract hemodynamic information from these cardiogenic artifacts. Methods: We apply a nonlinear time-frequency analysis technique, the de-shape synchrosqueezing transform (dsSST), to adaptively isolate the high-… ▽ More

    Submitted 26 February, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: 21 pages, 6 figures

  9. arXiv:1810.03599  [pdf, other

    cs.GR cs.CV cs.LG

    SFV: Reinforcement Learning of Physical Skills from Videos

    Authors: Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, Sergey Levine

    Abstract: Data-driven character animation based on motion capture can produce highly naturalistic behaviors and, when combined with physics simulation, can provide for natural procedural responses to physical perturbations, environmental changes, and morphological discrepancies. Motion capture remains the most popular source of motion data, but collecting mocap data typically requires heavily instrumented e… ▽ More

    Submitted 15 October, 2018; v1 submitted 8 October, 2018; originally announced October 2018.

  10. arXiv:1810.01406  [pdf, other

    cs.LG cs.CV cs.GR cs.NE stat.ML

    Super-Resolution via Conditional Implicit Maximum Likelihood Estimation

    Authors: Ke Li, Shichong Peng, Jitendra Malik

    Abstract: Single-image super-resolution (SISR) is a canonical problem with diverse applications. Leading methods like SRGAN produce images that contain various artifacts, such as high-frequency noise, hallucinated colours and shape distortions, which adversely affect the realism of the result. In this paper, we propose an alternative approach based on an extension of the method of Implicit Maximum Likelihoo… ▽ More

    Submitted 2 October, 2018; originally announced October 2018.

    Comments: 12 pages, 7 figures

  11. arXiv:1809.09087  [pdf, other

    cs.LG cs.NE stat.ML

    Implicit Maximum Likelihood Estimation

    Authors: Ke Li, Jitendra Malik

    Abstract: Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood un… ▽ More

    Submitted 22 October, 2018; v1 submitted 24 September, 2018; originally announced September 2018.

    Comments: 21 pages, 4 figures. In the interest of promoting discussion, we make the reviews available at https://people.eecs.berkeley.edu/~ke.li/papers/imle_reviews.pdf

  12. arXiv:1809.02882  [pdf, other

    cs.CV cs.LG

    Cost-Sensitive Active Learning for Intracranial Hemorrhage Detection

    Authors: Weicheng Kuo, Christian Häne, Esther Yuh, Pratik Mukherjee, Jitendra Malik

    Abstract: Deep learning for clinical applications is subject to stringent performance requirements, which raises a need for large labeled datasets. However, the enormous cost of labeling medical data makes this challenging. In this paper, we build a cost-sensitive active learning system for the problem of intracranial hemorrhage detection and segmentation on head computed tomography (CT). We show that our e… ▽ More

    Submitted 8 September, 2018; originally announced September 2018.

  13. arXiv:1808.10654  [pdf, other

    cs.AI cs.CV cs.GR cs.LG cs.RO

    Gibson Env: Real-World Perception for Embodied Agents

    Authors: Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese

    Abstract: Develo** visual perception models for active agents and sensorimotor control are cumbersome to be done in the physical world, as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly. This has given rise to learning-in-simulation which consequently casts a question on whether the results transfer to real-world. In this paper, we are concerned with t… ▽ More

    Submitted 31 August, 2018; originally announced August 2018.

    Comments: Access the code, dataset, and project website at http://gibsonenv.vision/ . CVPR 2018

    Journal ref: CVPR 2018

  14. arXiv:1808.09208  [pdf, other

    cs.CV

    DeepHPS: End-to-end Estimation of 3D Hand Pose and Shape by Learning from Synthetic Depth

    Authors: Jameel Malik, Ahmed Elhayek, Fabrizio Nunnari, Kiran Varanasi, Kiarash Tamaddon, Alexis Heloir, Didier Stricker

    Abstract: Articulated hand pose and shape estimation is an important problem for vision-based applications such as augmented reality and animation. In contrast to the existing methods which optimize only for joint positions, we propose a fully supervised deep network which learns to jointly estimate a full 3D hand mesh representation and pose from a single depth image. To this end, a CNN architecture is emp… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Comments: Accepted for publication in 3DV-2018 (http://3dv18.uniud.it/)

  15. arXiv:1808.00142  [pdf, other

    stat.AP physics.data-an stat.ML

    Sleep-wake classification via quantifying heart rate variability by convolutional neural network

    Authors: John Malik, Yu-Lun Lo, Hau-tieng Wu

    Abstract: Fluctuations in heart rate are intimately tied to changes in the physiological state of the organism. We examine and exploit this relationship by classifying a human subject's wake/sleep status using his instantaneous heart rate (IHR) series. We use a convolutional neural network (CNN) to build features from the IHR series extracted from a whole-night electrocardiogram (ECG) and predict every 30 s… ▽ More

    Submitted 31 July, 2018; originally announced August 2018.

  16. arXiv:1807.06757  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    On Evaluation of Embodied Navigation Agents

    Authors: Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir

    Abstract: Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence. The past two years have seen a surge of creative work on navigation. This creative output has produced a plethora of sometimes incompatible task definitions and evaluation protocols. To coordinate ongoing and future research in this area, we have convened a working group to study emp… ▽ More

    Submitted 17 July, 2018; originally announced July 2018.

    Comments: Report of a working group on empirical methodology in navigation research. Authors are listed in alphabetical order

  17. arXiv:1806.08354  [pdf, other

    cs.CV cs.AI cs.LG cs.RO stat.ML

    Learning Instance Segmentation by Interaction

    Authors: Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik

    Abstract: We present an approach for building an active agent that learns to segment its visual observations into individual objects by interacting with its environment in a completely self-supervised manner. The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels. The model learned from over 50K interactions g… ▽ More

    Submitted 21 June, 2018; originally announced June 2018.

    Comments: Website at https://pathak22.github.io/seg-by-interaction/

  18. arXiv:1806.03265  [pdf, other

    cs.CV

    PatchFCN for Intracranial Hemorrhage Detection

    Authors: Weicheng Kuo, Christian Häne, Esther Yuh, Pratik Mukherjee, Jitendra Malik

    Abstract: This paper studies the problem of detecting and segmenting acute intracranial hemorrhage on head computed tomography (CT) scans. We propose to solve both tasks as a semantic segmentation problem using a patch-based fully convolutional network (PatchFCN). This formulation allows us to accurately localize hemorrhages while bypassing the complexity of object detection. Our system demonstrates competi… ▽ More

    Submitted 14 April, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

  19. arXiv:1805.11085  [pdf, other

    cs.RO cs.LG stat.ML

    More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch

    Authors: Roberto Calandra, Andrew Owens, Dinesh Jayaraman, Justin Lin, Wenzhen Yuan, Jitendra Malik, Edward H. Adelson, Sergey Levine

    Abstract: For humans, the process of gras** an object relies heavily on rich tactile feedback. Most recent robotic gras** work, however, has been based only on visual input, and thus cannot easily benefit from feedback after initiating contact. In this paper, we investigate how a robot can learn to use tactile information to iteratively and efficiently adjust its grasp. To this end, we propose an end-to… ▽ More

    Submitted 26 July, 2018; v1 submitted 28 May, 2018; originally announced May 2018.

    Comments: 8 pages. Published on IEEE Robotics and Automation Letters (RAL). Website: https://sites.google.com/view/more-than-a-feeling

  20. Colouring $(P_r+P_s)$-Free Graphs

    Authors: Tereza Klimošová, Josef Malík, Tomáš Masařík, Jana Novotná, Daniël Paulusma, Veronika Slívová

    Abstract: The $k$-Colouring problem is to decide if the vertices of a graph can be coloured with at most $k$ colours for a fixed integer $k$ such that no two adjacent vertices are coloured alike. If each vertex u must be assigned a colour from a prescribed list $L(u) \subseteq \{1,\cdots, k\}$, then we obtain the List $k$-Colouring problem. A graph $G$ is $H$-free if $G$ does not contain $H$ as an induced s… ▽ More

    Submitted 16 March, 2021; v1 submitted 30 April, 2018; originally announced April 2018.

    Comments: 20 pages, 6 figures. An extended abstract of this paper appeared in the proceedings of ISAAC 2018

    Journal ref: Algorithmica 82(7), 1833-1858 (2020)

  21. arXiv:1804.08606  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Zero-Shot Visual Imitation

    Authors: Deepak Pathak, Parsa Mahmoudieh, Guanghao Luo, Pulkit Agrawal, Dian Chen, Yide Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, Trevor Darrell

    Abstract: The current dominant paradigm for imitation learning relies on strong supervision of expert actions to learn both 'what' and 'how' to imitate. We pursue an alternative paradigm wherein an agent first explores the world without any expert supervision and then distills its experience into a goal-conditioned skill policy with a novel forward consistency loss. In our framework, the role of the expert… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

    Comments: Oral presentation at ICLR 2018. Website at https://pathak22.github.io/zeroshot-imitation/

  22. arXiv:1804.08328  [pdf, other

    cs.CV cs.AI cs.LG cs.NE cs.RO

    Taskonomy: Disentangling Task Transfer Learning

    Authors: Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

    Abstract: Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and provides a principled way for identifying redundancies acros… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

    Comments: CVPR 2018 (Oral). See project website and live demos at http://taskonomy.vision/

  23. arXiv:1804.02811  [pdf, other

    math.ST

    Connecting Dots -- from Local Covariance to Empirical Intrinsic Geometry and Locally Linear Embedding

    Authors: John Malik, Chao Shen, Hau-Tieng Wu, Nan Wu

    Abstract: Local covariance structure under the manifold setup has been widely applied in the machine learning society. Based on the established theoretical results, we provide an extensive study of two relevant manifold learning algorithms, empirical intrinsic geometry (EIG) and the locally linear embedding (LLE) under the manifold setup. Particularly, we show that without an accurate dimension estimation,… ▽ More

    Submitted 8 February, 2019; v1 submitted 9 April, 2018; originally announced April 2018.

    Comments: 25pages, 4 figures

    MSC Class: 62-07

  24. arXiv:1803.07549  [pdf, other

    cs.CV

    Learning Category-Specific Mesh Reconstruction from Image Collections

    Authors: Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, Jitendra Malik

    Abstract: We present a learning framework for recovering the 3D shape, camera, and texture of an object from a single image. The shape is represented as a deformable 3D mesh model of an object category where a shape is parameterized by a learned mean shape and per-instance predicted deformation. Our approach allows leveraging an annotated image collection for training, where the deformable model and the 3D… ▽ More

    Submitted 30 July, 2018; v1 submitted 20 March, 2018; originally announced March 2018.

    Comments: Project URL: https://akanazawa.github.io/cmr/

  25. arXiv:1803.01710  [pdf, other

    eess.SP physics.data-an stat.AP

    Diffuse to fuse EEG spectra -- intrinsic geometry of sleep dynamics for classification

    Authors: Gi-Ren Liu, Yu-Lun Lo, John Malik, Yuan-Chung Sheu, Hau-tieng Wu

    Abstract: We propose a novel algorithm for sleep dynamics visualization and automatic annotation by applying diffusion geometry based sensor fusion algorithm to fuse spectral information from two electroencephalograms (EEG). The diffusion geometry approach helps organize the nonlinear dynamical structure hidden in the EEG signal. The visualization is achieved by the nonlinear dimension reduction capability… ▽ More

    Submitted 6 May, 2019; v1 submitted 28 February, 2018; originally announced March 2018.

  26. arXiv:1801.03910  [pdf, other

    cs.CV

    Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction

    Authors: Shubham Tulsiani, Alexei A. Efros, Jitendra Malik

    Abstract: We present a framework for learning single-view shape and pose prediction without using direct supervision for either. Our approach allows leveraging multi-view observations from unknown poses as supervisory signal during training. Our proposed training setup enforces geometric consistency between the independently predicted shape and pose from two views of the same instance. We consequently learn… ▽ More

    Submitted 24 April, 2018; v1 submitted 11 January, 2018; originally announced January 2018.

    Comments: Project url with code: https://shubhtuls.github.io/mvcSnP/

  27. arXiv:1712.08125  [pdf, other

    cs.CV cs.LG cs.RO

    Unifying Map and Landmark Based Representations for Visual Navigation

    Authors: Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik

    Abstract: This works presents a formulation for visual navigation that unifies map based spatial reasoning and path planning, with landmark based robust plan execution in noisy environments. Our proposed formulation is learned from data and is thus able to leverage statistical regularities of the world. This allows it to efficiently navigate in novel environments given only a sparse set of registered images… ▽ More

    Submitted 21 December, 2017; originally announced December 2017.

    Comments: Project page with videos: https://s-gupta.github.io/cmpl/

  28. arXiv:1712.06584  [pdf, other

    cs.CV

    End-to-end Recovery of Human Shape and Pose

    Authors: Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik

    Abstract: We describe Human Mesh Recovery (HMR), an end-to-end framework for reconstructing a full 3D mesh of a human body from a single RGB image. In contrast to most current methods that compute 2D or 3D joint locations, we produce a richer and more useful mesh representation that is parameterized by shape and 3D joint angles. The main objective is to minimize the reprojection loss of keypoints, which all… ▽ More

    Submitted 23 June, 2018; v1 submitted 18 December, 2017; originally announced December 2017.

    Comments: CVPR 2018, Project page with code: https://akanazawa.github.io/hmr/

  29. arXiv:1712.03121  [pdf, other

    cs.HC cs.CV

    Simultaneous Hand Pose and Skeleton Bone-Lengths Estimation from a Single Depth Image

    Authors: Jameel Malik, Ahmed Elhayek, Didier Stricker

    Abstract: Articulated hand pose estimation is a challenging task for human-computer interaction. The state-of-the-art hand pose estimation algorithms work only with one or a few subjects for which they have been calibrated or trained. Particularly, the hybrid methods based on learning followed by model fitting or model based deep learning do not explicitly consider varying hand shapes and sizes. In this wor… ▽ More

    Submitted 8 December, 2017; originally announced December 2017.

    Comments: This paper has been accepted and presented in 3DV-2017 conference held at Qingdao, China. http://irc.cs.sdu.edu.cn/3dv/

  30. arXiv:1712.02310  [pdf, other

    cs.CV

    From Lifestyle Vlogs to Everyday Interactions

    Authors: David F. Fouhey, Wei-cheng Kuo, Alexei A. Efros, Jitendra Malik

    Abstract: A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data. Most past efforts have gathered this data explicitly: starting with a laundry list of action labels, and then querying search engines for videos tagged with each label. In this work, we do the reverse and search implicitly: we start wit… ▽ More

    Submitted 6 December, 2017; originally announced December 2017.

    Comments: Project page at: http://people.eecs.berkeley.edu/~dfouhey/2017/VLOG/

  31. arXiv:1712.01812  [pdf, other

    cs.CV

    Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

    Authors: Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei A. Efros, Jitendra Malik

    Abstract: The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose. We propose a convolutional neural network-based approach to predict this representation and benchmark it on a large dataset of indoor scenes. Our experiments e… ▽ More

    Submitted 24 April, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

    Comments: Project url with code: https://shubhtuls.github.io/factored3d

  32. arXiv:1710.08247  [pdf, other

    cs.CV cs.LG cs.NE cs.RO

    Generic 3D Representation via Pose Estimation and Matching

    Authors: Amir R. Zamir, Tilman Wekel, Pulkit Argrawal, Colin Weil, Jitendra Malik, Silvio Savarese

    Abstract: Though a large body of computer vision research has investigated develo** generic semantic representations, efforts towards develo** a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the prem… ▽ More

    Submitted 23 October, 2017; originally announced October 2017.

    Comments: Published in ECCV16. See the project website http://3drepresentation.stanford.edu/ and dataset website https://github.com/amir32002/3D_Street_View

    Journal ref: ECCV 2016 535-553

  33. arXiv:1710.06104  [pdf, other

    cs.CV

    Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55

    Authors: Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra , et al. (25 additional authors not shown)

    Abstract: We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database. The benchmark consists of two tasks: part-level segmentation of 3D shapes and 3D reconstruction from single view images. Ten teams have participated in the challenge and the best performing teams have outperformed state-of-the-art approaches on both tasks. A few novel deep learni… ▽ More

    Submitted 27 October, 2017; v1 submitted 17 October, 2017; originally announced October 2017.

  34. arXiv:1709.00832  [pdf

    physics.med-ph

    Extended Phase Graph formalism for systems with Magnetization Transfer and Chemical Exchange

    Authors: Shaihan J. Malik, Rui P. A. G. Teixeira, Joseph V. Hajnal

    Abstract: An Extended Phase Graph framework for modelling systems with exchange or magnetization transfer (MT) is proposed. The framework, referred to as EPG-X, models coupled two-compartment systems by describing each compartment with separate phase graphs that exchange during evolution periods. There are two variants: EPG-X(BM) for systems governed by the Bloch-McConnell equations; and EPG-X(MT) for the p… ▽ More

    Submitted 4 September, 2017; originally announced September 2017.

    Comments: For associated code see https://github.com/mriphysics/EPG-X

  35. arXiv:1708.05375  [pdf, other

    cs.CV

    Learning a Multi-View Stereo Machine

    Authors: Abhishek Kar, Christian Häne, Jitendra Malik

    Abstract: We present a learnt system for multi-view stereopsis. In contrast to recent learning based methods for 3D reconstruction, we leverage the underlying 3D geometry of the problem through feature projection and unprojection along viewing rays. By formulating these operations in a differentiable manner, we are able to learn the system end-to-end for the task of metric 3D reconstruction. End-to-end lear… ▽ More

    Submitted 17 August, 2017; originally announced August 2017.

  36. arXiv:1705.08421  [pdf, other

    cs.CV

    AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

    Authors: Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

    Abstract: This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual… ▽ More

    Submitted 30 April, 2018; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: To appear in CVPR 2018. Check dataset page https://research.google.com/ava/ for details

  37. arXiv:1704.06254  [pdf, other

    cs.CV

    Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency

    Authors: Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik

    Abstract: We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view. We do so by reformulating view consistency using a differentiable ray consistency (DRC) term. We show that this formulation can be incorporated in a learning framework to leverage different… ▽ More

    Submitted 20 April, 2017; originally announced April 2017.

    Comments: To appear at CVPR 2017. Project webpage : https://shubhtuls.github.io/drc/

  38. arXiv:1704.00710  [pdf, other

    cs.CV

    Hierarchical Surface Prediction for 3D Object Reconstruction

    Authors: Christian Häne, Shubham Tulsiani, Jitendra Malik

    Abstract: Recently, Convolutional Neural Networks have shown promising results for 3D geometry prediction. They can make predictions from very little input data such as a single color image. A major limitation of such approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface of the objects well. We propose a general framework, called hierarchical surface predict… ▽ More

    Submitted 6 November, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

    Comments: 3DV 2017

  39. arXiv:1703.02018  [pdf, other

    cs.CV cs.LG cs.RO

    Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation

    Authors: Ashvin Nair, Dian Chen, Pulkit Agrawal, Phillip Isola, Pieter Abbeel, Jitendra Malik, Sergey Levine

    Abstract: Manipulation of deformable objects, such as ropes and cloth, is an important but challenging problem in robotics. We present a learning-based system where a robot takes as input a sequence of images of a human manipulating a rope from an initial to goal configuration, and outputs a sequence of actions that can reproduce the human demonstration, using only monocular images as input. To perform this… ▽ More

    Submitted 6 March, 2017; originally announced March 2017.

    Comments: 8 pages, accepted to International Conference on Robotics and Automation (ICRA) 2017

  40. arXiv:1703.00441  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Learning to Optimize Neural Nets

    Authors: Ke Li, Jitendra Malik

    Abstract: Learning to Optimize is a recently proposed framework for learning optimization algorithms using reinforcement learning. In this paper, we explore learning an optimization algorithm for training shallow neural nets. Such high-dimensional stochastic optimization problems present interesting challenges for existing reinforcement learning algorithms. We develop an extension that is suited to learning… ▽ More

    Submitted 30 November, 2017; v1 submitted 1 March, 2017; originally announced March 2017.

    Comments: 10 pages, 15 figures

  41. arXiv:1703.00440  [pdf, other

    cs.LG cs.AI cs.DS cs.IR stat.ML

    Fast k-Nearest Neighbour Search via Prioritized DCI

    Authors: Ke Li, Jitendra Malik

    Abstract: Most exact methods for k-nearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing (DCI) offers a promising way of circumventing the curse and successfully reduces the dependence of query time on intrinsic dimensionality from exponential to sublinear.… ▽ More

    Submitted 20 July, 2017; v1 submitted 1 March, 2017; originally announced March 2017.

    Comments: 14 pages, 6 figures; International Conference on Machine Learning (ICML), 2017

  42. arXiv:1702.08638  [pdf, other

    physics.data-an q-bio.QM stat.AP

    Single-lead f-wave extraction using diffusion geometry

    Authors: John Malik, Neil Reed, Chun-Li Wang, Hautieng Wu

    Abstract: A novel single-lead f-wave extraction algorithm based on the modern diffusion geometry data analysis framework is proposed. The algorithm is essentially an averaged beat subtraction algorithm, where the ventricular activity template is estimated by combining a newly designed metric, the "diffusion distance," and the non-local Euclidean median based on the non-linear manifold setup. We coined the a… ▽ More

    Submitted 28 April, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

    Comments: 31 pages, 8 figures

  43. arXiv:1702.03920  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Cognitive Map** and Planning for Visual Navigation

    Authors: Saurabh Gupta, Varun Tolani, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik

    Abstract: We introduce a neural architecture for navigation in novel environments. Our proposed architecture learns to map from first-person views and plans a sequence of actions towards goals in the environment. The Cognitive Mapper and Planner (CMP) is based on two key ideas: a) a unified joint architecture for map** and planning, such that the map** is driven by the needs of the task, and b) a spatia… ▽ More

    Submitted 7 February, 2019; v1 submitted 13 February, 2017; originally announced February 2017.

    Comments: Extended IJCV Version of the original paper at CVPR17. Project website with code, models, simulation environment and videos: https://sites.google.com/view/cognitive-map**-and-planning/

  44. arXiv:1612.09508  [pdf, other

    cs.CV

    Feedback Networks

    Authors: Amir R. Zamir, Te-Lin Wu, Lin Sun, William Shen, Jitendra Malik, Silvio Savarese

    Abstract: Currently, the most successful learning models in computer vision are based on learning successive representations followed by a decision layer. This is usually actualized through feedforward multilayer neural networks, e.g. ConvNets, where each layer forms one of such successive representations. However, an alternative that can achieve the same goal is a feedback based approach in which the repre… ▽ More

    Submitted 20 August, 2017; v1 submitted 30 December, 2016; originally announced December 2016.

    Comments: See a video describing the method at https://youtu.be/MY5Uhv38Ttg and the website at http://feedbacknet.stanford.edu/

  45. arXiv:1612.06851  [pdf, other

    cs.CV cs.LG

    Beyond Skip Connections: Top-Down Modulation for Object Detection

    Authors: Abhinav Shrivastava, Rahul Sukthankar, Jitendra Malik, Abhinav Gupta

    Abstract: In recent years, we have seen tremendous progress in the field of object detection. Most of the recent improvements have been achieved by targeting deeper feedforward networks. However, many hard object categories such as bottle, remote, etc. require representation of fine details and not just coarse, semantic representations. But most of these fine details are lost in the early convolutional laye… ▽ More

    Submitted 19 September, 2017; v1 submitted 20 December, 2016; originally announced December 2016.

  46. arXiv:1612.00404  [pdf, other

    cs.CV

    Learning Shape Abstractions by Assembling Volumetric Primitives

    Authors: Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, Jitendra Malik

    Abstract: We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives. In addition to generating simple and geometrically interpretable explanations of 3D objects, our framework also allows us to automatically discover and exploit consistent structure in the data. We demonstrate that using our method allows predicting shape representations wh… ▽ More

    Submitted 2 August, 2018; v1 submitted 1 December, 2016; originally announced December 2016.

    Comments: Project url: https://shubhtuls.github.io/volumetricPrimitives/

  47. arXiv:1606.07419  [pdf, other

    cs.CV cs.AI cs.RO

    Learning to Poke by Poking: Experiential Learning of Intuitive Physics

    Authors: Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra Malik, Sergey Levine

    Abstract: We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics. Our model is evaluated on a real-world robotic manipulation task that requires displacing objects to target locations by poking. The robot gathered over 400 hours of experience by executing more than 100K pokes on different objects. We propose a novel approach based on deep neural networks for mo… ▽ More

    Submitted 15 February, 2017; v1 submitted 23 June, 2016; originally announced June 2016.

    Journal ref: NIPS 2016

  48. arXiv:1606.01885  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Learning to Optimize

    Authors: Ke Li, Jitendra Malik

    Abstract: Algorithm design is a laborious process and often requires many iterations of ideation and validation. In this paper, we explore automating algorithm design and present a method to learn an optimization algorithm, which we believe to be the first method that can automatically discover a better algorithm. We approach this problem from a reinforcement learning perspective and represent any particula… ▽ More

    Submitted 6 June, 2016; originally announced June 2016.

    Comments: 9 pages, 3 figures

  49. arXiv:1605.03557  [pdf, other

    cs.CV

    View Synthesis by Appearance Flow

    Authors: Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros

    Abstract: We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels from scratch, we learn to copy them from the input image. Our approach exploits the observation that the visual appearance of different views of the… ▽ More

    Submitted 11 February, 2017; v1 submitted 11 May, 2016; originally announced May 2016.

  50. arXiv:1604.08202  [pdf, other

    cs.CV

    Amodal Instance Segmentation

    Authors: Ke Li, Jitendra Malik

    Abstract: We consider the problem of amodal instance segmentation, the objective of which is to predict the region encompassing both visible and occluded parts of each object. Thus far, the lack of publicly available amodal segmentation annotations has stymied the development of amodal segmentation methods. In this paper, we sidestep this issue by relying solely on standard modal instance segmentation annot… ▽ More

    Submitted 17 August, 2016; v1 submitted 27 April, 2016; originally announced April 2016.

    Comments: 23 pages, 14 figures; European Conference on Computer Vision (ECCV), 2016